Knowledge + Practice

CCNA Pde Storing Data Questions

75 of 100 questions · Page 1/2 · Pde Storing Data topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQhard

A healthcare organization stores patient data in BigQuery. They need to encrypt a specific column (e.g., SSN) using a key they manage, and decrypt it only for authorized queries via a user-defined function. Which approach should they use?

A.Use BigQuery AEAD encryption functions with a Cloud KMS key

B.Use BigQuery column-level access controls

C.Use Cloud Key Management Service (Cloud KMS) with CMEK for the BigQuery dataset

D.Use Cloud Data Loss Prevention (DLP) to de-identify the column

AnswerA

AEAD functions allow encrypting specific columns and decrypting via a SQL function, using keys from Cloud KMS.

Why this answer

BigQuery AEAD encryption functions (e.g., `AEAD.ENCRYPT` and `AEAD.DECRYPT`) allow you to encrypt a specific column using a customer-managed key stored in Cloud KMS, and then decrypt it only within a user-defined function (UDF) that enforces access controls. This meets the requirement of per-column encryption with key management and authorized decryption via a UDF.

Exam trap

Cisco often tests the distinction between dataset-level encryption (CMEK) and column-level encryption (AEAD), where candidates mistakenly choose CMEK because it involves Cloud KMS, but it does not allow per-column encryption or UDF-controlled decryption.

How to eliminate wrong answers

Option B is wrong because BigQuery column-level access controls only restrict who can see the column, but they do not encrypt the data at rest or in transit, so the data remains in plaintext and does not satisfy the encryption requirement. Option C is wrong because Cloud KMS with CMEK encrypts the entire BigQuery dataset at the storage level, not a specific column, and decryption is automatic for authorized users, not controlled via a UDF. Option D is wrong because Cloud DLP de-identifies data (e.g., masking or tokenization) but is not designed for reversible encryption with a customer-managed key and UDF-based decryption; it is typically used for static de-identification, not dynamic per-query decryption.

Practice this question →

2

Multi-Selecthard

A data engineer needs to create a unified table that combines data from Cloud Storage (Parquet files) and BigQuery native tables, with fine-grained access control and governance. Which three Google Cloud features should they use together? (Choose THREE.)

Select 3 answers

A.BigQuery

B.Dataproc

C.BigLake

D.Cloud Storage

E.Cloud SQL

AnswersA, C, D

BigQuery is used for both native tables and the unified query engine.

Why this answer

BigQuery is correct because it serves as the unified query engine that can read data from both Cloud Storage (via external tables or BigLake) and native BigQuery tables, enabling a single SQL interface for analysis. It also integrates with fine-grained access control through row-level security and column-level access policies, and supports governance via Data Catalog and VPC Service Controls.

Exam trap

The trap here is that candidates may confuse Dataproc (a processing engine) with a storage or query service, or think Cloud SQL can handle Parquet files, when the correct combination requires BigQuery, BigLake, and Cloud Storage to achieve unified querying and governance.

Practice this question →

3

Multi-Selecteasy

A data engineer wants to set up automatic deletion of objects from a Cloud Storage bucket after 30 days, and transition objects older than 7 days to Nearline storage. Which THREE steps should they take? (Select three.)

Select 3 answers

A.Set the bucket's default storage class to Nearline

B.Create a lifecycle rule with action Delete and condition age 30 days

C.Create a lifecycle rule with action SetStorageClass to Nearline and condition age 7 days

D.Enable Object Versioning on the bucket

E.Apply the lifecycle configuration to the bucket using gsutil lifecycle set

AnswersB, C, E

This deletes objects older than 30 days.

Why this answer

Option B is correct because lifecycle rules in Google Cloud Storage allow you to specify an action (Delete) and a condition (age 30 days) to automatically delete objects after 30 days. This meets the requirement for automatic deletion without manual intervention.

Exam trap

Cisco often tests the distinction between bucket default storage class (which only applies to new objects) and lifecycle rules (which can transition or delete existing objects based on age).

Practice this question →

4

MCQeasy

A mobile app needs a real-time NoSQL database that supports offline sync and automatic conflict resolution. Which Google Cloud database is best suited?

A.Cloud SQL

B.Firestore

C.Cloud Bigtable

D.Cloud Spanner

AnswerB

Firestore provides offline data persistence, automatic sync, and conflict resolution for mobile and web clients.

Why this answer

Firestore is the correct choice because it is a NoSQL, real-time database that provides offline data persistence and automatic conflict resolution via multi-version concurrency control (MVCC) and last-write-wins (LWW) semantics. Its client SDKs synchronize data in the background when connectivity is restored, making it ideal for mobile apps requiring offline-first functionality.

Exam trap

Cisco often tests the misconception that any NoSQL database (like Bigtable) supports mobile offline sync, but Bigtable lacks client-side SDKs and conflict resolution mechanisms, making Firestore the only option designed specifically for real-time mobile apps with offline capabilities.

How to eliminate wrong answers

Option A (Cloud SQL) is wrong because it is a relational (SQL) database that does not natively support real-time sync or offline-first mobile clients; it requires custom backend logic for conflict resolution. Option C (Cloud Bigtable) is wrong because it is a wide-column NoSQL database designed for high-throughput analytical workloads, not for real-time mobile sync or offline support; it lacks client-side SDKs for automatic conflict resolution. Option D (Cloud Spanner) is wrong because it is a globally distributed relational database with strong consistency, but it does not provide built-in offline sync or automatic conflict resolution for mobile clients; it is optimized for OLTP workloads requiring ACID transactions across regions.

Practice this question →

5

MCQhard

A company runs a global financial application requiring strong consistency across continents with 99.999% availability. They need to store transaction data with ACID properties and sub-10ms write latency from any region. Which storage service meets all requirements?

A.Cloud SQL (MySQL) with cross-region read replicas

B.Cloud Spanner

C.BigQuery multi-region

D.Cloud Bigtable multi-cluster routing

AnswerB

Spanner offers global strong consistency, external consistency, 99.999% SLA, and sub-10ms writes.

Why this answer

Cloud Spanner is the only service that provides ACID transactions, strong consistency across continents, and 99.999% availability with sub-10ms write latency. It uses synchronous replication with Paxos-based consensus to ensure consistent writes globally, meeting the strict requirements of a global financial application.

Exam trap

Cisco often tests the misconception that cross-region read replicas (like Cloud SQL) can provide strong consistency globally, but they are asynchronous and only offer eventual consistency, failing ACID requirements for transactional writes.

How to eliminate wrong answers

Option A is wrong because Cloud SQL (MySQL) with cross-region read replicas does not provide strong consistency across continents; writes are committed only in the primary region, and read replicas are asynchronous, leading to potential stale reads and violating ACID consistency globally. Option C is wrong because BigQuery multi-region is an analytics data warehouse designed for large-scale queries, not transactional workloads; it does not support ACID transactions with sub-10ms write latency and is optimized for batch processing, not real-time writes. Option D is wrong because Cloud Bigtable with multi-cluster routing is a NoSQL database that does not support ACID transactions; it offers eventual consistency across clusters and is designed for high-throughput analytical workloads, not strong consistency and sub-10ms writes for financial transactions.

Practice this question →

6

Multi-Selectmedium

A company has a data lake on Cloud Storage with raw data in the 'raw' bucket, curated data in 'curated', and processed data in 'processed'. They want to implement lifecycle management to reduce costs. Which TWO actions should they take? (Choose 2)

Select 2 answers

A.Set a lifecycle rule to change storage class from Standard to Nearline after 30 days for the 'raw' bucket.

B.Enable object versioning on all buckets to automatically delete older versions.

C.Set a partition expiration on BigQuery tables that reference data in the 'processed' bucket.

D.Set a lifecycle rule to delete objects older than 365 days in the 'curated' and 'processed' buckets.

E.Set a lifecycle rule to change storage class from Standard to Archive after 30 days for the 'raw' bucket.

AnswersA, D

Nearline has a 30-day minimum storage duration, ideal for data accessed less than once a month.

Why this answer

Setting a lifecycle rule to change storage class from Standard to Nearline after 30 days for raw data reduces costs while maintaining access. For curated and processed data, a rule to delete objects older than 365 days helps manage costs. Option A is viable but not as cost-effective as Nearline.

Option D is about BigQuery table expiration, not Cloud Storage. Option E is about versioning, not lifecycle.

Practice this question →

7

Multi-Selectmedium

A company is designing a Cloud Bigtable row key for a time-series dataset of device readings. They want to avoid hotspotting (uneven load across tablets). Which TWO row key design patterns are effective? (Choose 2)

Select 2 answers

A.Use a monotonically increasing counter as row key

B.Use timestamp directly as the first part of the row key

C.Reverse the timestamp string

D.Prepend a hash of the device ID to the timestamp

E.Use a secondary index on the timestamp column

AnswersC, D

Reversing the timestamp spreads writes across tablets, as recent timestamps become diverse.

Why this answer

Prepending a hash (like a hash of device ID) distributes writes across tablets. Reversing a timestamp string (e.g., MAX_TIME - ts) spreads recent writes. Option B (timestamp first) causes hotspotting.

Option C (monotonic) also hotspots. Option E is about secondary index, not row key.

Practice this question →

8

MCQmedium

A company uses Cloud Spanner and needs to store a parent-child relationship where the child table is frequently queried together with the parent. The parent has millions of rows and the child billions. Which Spanner feature optimizes performance for this pattern?

A.Partitioned tables

B.Interleaved tables

C.Secondary indexes

D.Change streams

AnswerB

Interleaved tables store child rows physically with the parent row, optimizing joins.

Why this answer

Interleaved tables in Cloud Spanner co-locate parent and child rows on the same split, reducing cross-node communication when joining. Secondary indexes are for lookups, not co-location. Partitioning is not a Spanner concept.

Practice this question →

9

MCQmedium

A data engineer needs to enforce that all datasets in a project expire after 90 days to reduce storage costs. They want to automate this without manual intervention. Which approach should they use?

A.Use Cloud Storage lifecycle rules to delete tables after 90 days

B.Create an IAM policy that revokes access after 90 days

C.Use BigQuery scheduled queries to delete tables older than 90 days

D.Set a default table expiration on each BigQuery dataset to 90 days

AnswerD

BigQuery dataset properties allow setting a default table expiration, automatically deleting tables after the specified days.

Why this answer

Option D is correct because BigQuery datasets support a default table expiration setting that automatically deletes tables after a specified number of days. This enforces a 90-day lifecycle for all tables in the dataset without requiring manual intervention or external automation, directly addressing the cost reduction goal.

Exam trap

Cisco often tests the distinction between storage lifecycle management (Cloud Storage) and data lifecycle management (BigQuery), leading candidates to mistakenly apply Cloud Storage rules to BigQuery tables.

How to eliminate wrong answers

Option A is wrong because Cloud Storage lifecycle rules apply to objects in Cloud Storage buckets, not to BigQuery tables; BigQuery tables are stored separately and cannot be managed by Cloud Storage lifecycle policies. Option B is wrong because IAM policies control access permissions, not data lifecycle; revoking access does not delete tables or reduce storage costs. Option C is wrong because BigQuery scheduled queries can delete tables but require writing and maintaining custom SQL scripts and scheduling logic, introducing complexity and potential failure points, whereas a default table expiration is a declarative, server-managed setting that requires no ongoing maintenance.

Practice this question →

10

Multi-Selectmedium

A data engineer needs to restrict access to BigQuery datasets such that only data from approved VPC networks can query them. They also need to audit data access. Which two security controls should they implement? (Choose two.)

Select 2 answers

A.Cloud Audit Logs

B.Customer-managed encryption keys (CMEK)

C.Data Loss Prevention (DLP)

D.IAM roles

E.VPC Service Controls

AnswersA, E

Audit logs record data access for auditing and monitoring.

Why this answer

Cloud Audit Logs (option A) is correct because it provides a record of all administrative and data-access operations on BigQuery datasets, enabling the data engineer to audit who accessed what data and from which network. This satisfies the requirement to audit data access by capturing detailed logs of API calls, including the identity of the caller and the source IP address.

Exam trap

Cisco often tests the distinction between identity-based controls (IAM) and network-based controls (VPC Service Controls), leading candidates to mistakenly choose IAM roles when the requirement explicitly specifies restricting access by VPC network rather than by user identity.

Practice this question →

11

MCQhard

A financial services company uses Cloud Bigtable to store trade data. They are experiencing hot-spotting on a single node, causing high latency. The row key format is [trade_id]#[timestamp]. Which row key design change would BEST distribute writes across tablets?

A.Use a hashed prefix of the trade_id, e.g., [hash(trade_id)]#[trade_id]#[timestamp]

B.Use a single row key of [timestamp]

C.Increase the number of Bigtable nodes to 20

D.Change row key to [timestamp]#[trade_id]

AnswerA

A hash prefix distributes writes across tablets by randomizing the start of the row key, reducing hot-spotting.

Why this answer

Option A is correct because adding a hashed prefix of the trade_id ensures that writes are evenly distributed across all Bigtable tablets. Bigtable partitions data by row key lexicographic order; without a hash, sequential trade IDs or timestamps cause all recent writes to land on a single tablet, creating a hotspot. The hash spreads the write load uniformly, regardless of the underlying key pattern.

Exam trap

Cisco often tests the misconception that simply reversing the order of key components (e.g., putting the timestamp first) is sufficient to avoid hot-spotting, when in fact any monotonically increasing value at the start of the key will still cause a hotspot.

How to eliminate wrong answers

Option B is wrong because using a single row key of [timestamp] would cause all writes with the same timestamp to collide on one row, creating an extreme hotspot and violating Bigtable's requirement for unique, distributed row keys. Option C is wrong because increasing the number of nodes does not fix a row key design flaw; Bigtable cannot rebalance writes if the row key pattern forces all traffic to a single tablet, and adding nodes only helps if the load is already distributed. Option D is wrong because reversing the order to [timestamp]#[trade_id] still places all writes with the same timestamp adjacent in lexicographic order, so recent timestamps will still hotspot on a single tablet; it does not introduce the randomness needed for distribution.

Practice this question →

12

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.Cloud Spanner

B.Firestore

C.Cloud Bigtable

D.BigQuery

AnswerC

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is the correct choice because it is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, offering single-digit millisecond latency for high-throughput reads and writes. Its key-value model with timestamp-based versioning directly matches the time-series IoT sensor data pattern, and it can handle petabytes of data with millions of reads per second using its distributed, sorted key-value store (based on Google's Chubby and GFS).

Exam trap

The trap here is that candidates often confuse Cloud Spanner's strong consistency and global scale with suitability for high-throughput time-series workloads, but Spanner's transactional overhead and relational model make it far less performant for simple key-value reads at millions of QPS compared to Bigtable's optimized NoSQL design.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database optimized for ACID transactions and complex joins, not for high-throughput, low-latency key-value time-series workloads at petabyte scale. Option B is wrong because Firestore is a mobile/web document database with real-time sync and limited throughput (up to 10,000 writes/second per database), making it unsuitable for millions of reads per second and petabyte-scale data. Option D is wrong because BigQuery is a serverless data warehouse for analytical SQL queries on large datasets, not a low-latency operational database; it is optimized for batch and interactive analytics, not single-digit millisecond reads at millions of operations per second.

Practice this question →

13

MCQmedium

A company has a Cloud SQL for PostgreSQL instance and wants to create a read replica to offload read traffic from the primary. They also need to ensure the replica is in a different region for disaster recovery. Which Cloud SQL feature should they use?

A.External replica

B.Automatic backups with PITR

C.Cross-region read replica

D.High availability (HA) configuration

AnswerC

Cloud SQL supports creating read replicas in different regions to offload reads and provide DR.

Why this answer

Cross-region read replicas in Cloud SQL for PostgreSQL allow you to create a replica in a different region from the primary instance. This offloads read traffic from the primary while also providing disaster recovery capabilities by maintaining a standby copy in a geographically separate location. The replica uses asynchronous replication to stay up-to-date with the primary.

Exam trap

Cisco often tests the distinction between high availability (HA) within a region and cross-region replicas, where candidates mistakenly choose HA configuration thinking it provides regional redundancy, but HA only protects against zone-level failures, not regional disasters.

How to eliminate wrong answers

Option A is wrong because an external replica refers to a replica running outside of Cloud SQL, such as on a self-managed instance or on-premises, which does not meet the requirement for a managed Cloud SQL replica in a different region. Option B is wrong because automatic backups with point-in-time recovery (PITR) are used for data recovery from failures or corruption, not for offloading read traffic or providing a separate regional replica for disaster recovery. Option D is wrong because a high availability (HA) configuration uses synchronous replication within the same region (typically across zones) to provide failover, not a cross-region replica for disaster recovery or read offloading.

Practice this question →

14

MCQmedium

An application requires a globally distributed, strongly consistent database with 99.999% availability SLA. The workload is OLTP with high throughput across continents. Which service fits best?

A.Cloud SQL with cross-region replicas

B.Firestore

C.Cloud Bigtable

D.Cloud Spanner

AnswerD

Correct: global distribution, strong consistency, 99.999% SLA.

Why this answer

Cloud Spanner is the only service that provides globally distributed, strongly consistent (external consistency via TrueTime) OLTP with 99.999% availability SLA. It supports high-throughput ACID transactions across continents using synchronous replication and atomic clocks, meeting all stated requirements.

Exam trap

Cisco often tests the misconception that cross-region replicas in Cloud SQL provide strong consistency, but candidates must remember that asynchronous replication leads to eventual consistency and a lower SLA.

How to eliminate wrong answers

Option A is wrong because Cloud SQL with cross-region replicas uses asynchronous replication, which cannot guarantee strong consistency across regions and offers only a 99.95% SLA for regional instances, not 99.999%. Option B is wrong because Firestore is a NoSQL document database that provides strong consistency only within a single region; its multi-region mode uses eventual consistency for global reads, and it lacks the ACID transaction support needed for high-throughput OLTP across continents. Option C is wrong because Cloud Bigtable is a wide-column NoSQL database designed for analytical workloads with high throughput, but it does not support SQL queries, ACID transactions, or strong consistency across regions; it offers only single-row transactions and eventual consistency for multi-cluster replication.

Practice this question →

15

MCQmedium

A company wants to run hybrid transactional and analytical workloads on a PostgreSQL-compatible database with high performance. Which service should they choose?

A.Cloud Spanner

B.Cloud SQL for PostgreSQL

C.BigQuery

D.AlloyDB

AnswerD

Correct: AlloyDB has a columnar engine for analytics and is PostgreSQL-compatible.

Why this answer

AlloyDB is the correct choice because it is a fully managed PostgreSQL-compatible database service specifically designed for high-performance hybrid transactional and analytical workloads. It combines transactional processing with built-in columnar analytics, delivering up to 4x faster transactional performance and up to 100x faster analytical queries than standard PostgreSQL, without requiring any schema changes or ETL.

Exam trap

Cisco often tests the distinction between 'PostgreSQL-compatible' and 'PostgreSQL-based' — candidates mistakenly choose Cloud SQL for PostgreSQL because it is a managed PostgreSQL service, but they overlook the specific requirement for hybrid transactional and analytical workloads, which AlloyDB uniquely addresses with its integrated columnar engine.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database that is not PostgreSQL-compatible (it uses GoogleSQL or standard SQL with Spanner-specific extensions) and is optimized for horizontal scaling across regions, not for hybrid transactional/analytical workloads with PostgreSQL compatibility. Option B is wrong because Cloud SQL for PostgreSQL is a fully managed PostgreSQL service but is designed primarily for transactional (OLTP) workloads and lacks the built-in columnar engine and analytical acceleration needed for hybrid workloads, resulting in significantly slower analytical queries. Option C is wrong because BigQuery is a serverless, highly scalable data warehouse for analytical (OLAP) workloads, not a transactional database, and it is not PostgreSQL-compatible (it uses BigQuery SQL).

Practice this question →

16

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.BigQuery

B.Firestore

C.Cloud Bigtable

D.Cloud Spanner

AnswerC

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is the correct choice because it is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, offering consistent sub-10ms latency for high-throughput reads and writes. It natively supports time-series data with row key design optimized for timestamp-based queries, and can handle millions of reads per second across petabytes of data, making it ideal for IoT sensor data.

Exam trap

Cisco often tests the misconception that BigQuery is suitable for real-time, high-throughput key-value lookups because of its speed on analytical queries, but BigQuery is not designed for point reads at millions of operations per second with single-digit millisecond latency.

How to eliminate wrong answers

Option A is wrong because BigQuery is a serverless data warehouse optimized for complex analytical SQL queries on large datasets, not for single-digit millisecond point lookups at millions of reads per second; its latency is typically in the hundreds of milliseconds to seconds. Option B is wrong because Firestore is a mobile/document database designed for real-time sync and moderate throughput, not for petabyte-scale time-series data with millions of reads per second; it has throughput limits and higher latency for such workloads. Option D is wrong because Cloud Spanner is a globally distributed relational database with strong consistency and SQL support, but it is overkill for simple key-value time-series data and incurs higher latency and cost compared to Bigtable for this specific use case.

Practice this question →

17

Multi-Selectmedium

A company is designing a data lake on Cloud Storage with BigLake tables for unified governance. Which TWO statements about BigLake are correct? (Choose 2.)

Select 2 answers

A.BigLake requires data to be loaded into BigQuery storage.

B.BigLake tables allow querying data in Cloud Storage using BigQuery without loading into BigQuery storage.

C.BigLake only supports structured data in BigQuery storage.

D.BigLake supports row-level and column-level security.

E.BigLake automatically converts CSV files to Parquet.

AnswersB, D

BigLake tables are external tables that can query data directly from GCS.

Why this answer

BigLake tables allow you to query data directly in Cloud Storage using BigQuery without the need to first load the data into BigQuery storage. This enables a data lake architecture where the data remains in an open format in Cloud Storage, while BigLake provides unified governance and fine-grained access control.

Exam trap

Cisco often tests the misconception that BigLake requires data to be loaded into BigQuery storage, when in fact it is designed to query external data in Cloud Storage directly.

Practice this question →

18

MCQeasy

A mobile app needs an offline-first NoSQL database that syncs data across devices when connectivity is available. Which Google Cloud database meets these requirements?

A.Memorystore

B.Cloud SQL

C.Bigtable

D.Firestore

AnswerD

Firestore provides offline data persistence and automatic sync, perfect for mobile apps.

Why this answer

Firestore is a NoSQL, serverless, offline-first database that automatically syncs data across devices when connectivity is restored. It provides built-in offline persistence and real-time synchronization, making it ideal for mobile apps that need to work offline and sync later.

Exam trap

The trap here is that candidates may confuse Bigtable's NoSQL label with mobile-friendly NoSQL, overlooking that Bigtable is designed for high-throughput analytical workloads, not for offline-first mobile sync with real-time listeners.

How to eliminate wrong answers

Option A is wrong because Memorystore is a fully managed in-memory cache (Redis/Memcached) designed for caching and session storage, not a persistent NoSQL database with offline sync capabilities. Option B is wrong because Cloud SQL is a relational (SQL) database, not a NoSQL database, and it does not offer offline-first or cross-device sync features. Option C is wrong because Bigtable is a wide-column NoSQL database optimized for large analytical workloads, not for mobile app offline sync with real-time updates.

Practice this question →

19

Multi-Selectmedium

A company needs to store transactional data for a global customer base with strong consistency and 99.999% availability SLA. They anticipate millions of transactions per day across multiple regions. Which TWO storage options meet these requirements? (Choose 2)

Select 1 answer

A.Cloud Bigtable

B.AlloyDB

C.Firestore (in Datastore mode)

D.Cloud SQL (with HA)

E.Cloud Spanner

AnswersE

Spanner provides globally distributed ACID transactions, strong consistency, and 99.999% availability SLA.

Why this answer

Cloud Spanner is the only GCP service that offers globally distributed ACID transactions with strong consistency and 99.999% SLA. Cloud Bigtable offers high availability but not strong consistency across rows (only eventual). Cloud SQL and Firestore do not provide 99.999% SLA globally.

AlloyDB is regional, not global.

Practice this question →

20

MCQeasy

An organization needs to store transactional data for a global e-commerce platform with strong consistency across regions and an SLA of 99.999% availability. The application requires SQL semantics with horizontal scaling. Which Google Cloud database should they choose?

A.Firestore

B.Cloud SQL

C.Cloud Spanner

D.Cloud Bigtable

AnswerC

Spanner is globally distributed, provides strong consistency, and offers 99.999% availability SLA, matching the requirements.

Why this answer

Cloud Spanner is the correct choice because it provides globally distributed, strongly consistent SQL semantics with horizontal scaling and a 99.999% availability SLA. It uses synchronous replication and the TrueTime API to ensure external consistency across regions, meeting the strict consistency and uptime requirements of a global e-commerce platform.

Exam trap

The trap here is that candidates often confuse Cloud Spanner with Cloud SQL, assuming any SQL database can scale horizontally, but Cloud SQL is a single-region, vertically scaled service, while Spanner is the only Google Cloud database that combines SQL, horizontal scaling, and global strong consistency with a 99.999% SLA.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database that does not support SQL semantics; it offers strong consistency only within a single region and lacks the global consistency and 99.999% SLA required. Option B is wrong because Cloud SQL is a traditional relational database that supports SQL but cannot horizontally scale across regions; it is limited to a single region and provides up to 99.95% availability, not 99.999%. Option D is wrong because Cloud Bigtable is a NoSQL wide-column database that does not support SQL semantics and provides only eventual consistency, not the strong consistency required for transactional data.

Practice this question →

21

Multi-Selectmedium

A company is designing a data lake on Cloud Storage with different zones. They need to enforce data retention so that objects in the 'raw' zone are automatically deleted after 1 year. Which TWO actions should they take? (Choose 2 correct options)

Select 2 answers

A.Use a bucket retention policy with a retention period of 1 year

B.Configure a Cloud Storage object lifecycle rule with a Delete action

C.Set an IAM policy to prevent deletion of objects

D.Create a Cloud Function to check object age and delete them

E.Apply a lifecycle rule that deletes objects with the prefix 'raw/'

AnswersB, E

Lifecycle rules can delete objects automatically based on age.

Why this answer

Option B is correct because Cloud Storage object lifecycle management allows you to set rules that automatically delete objects after a specified age. By configuring a lifecycle rule with a Delete action and setting the condition to 'Age: 365 days', objects in the 'raw' zone will be automatically removed after 1 year, meeting the retention requirement without manual intervention.

Exam trap

The trap here is confusing retention policies (which prevent deletion) with lifecycle rules (which trigger deletion), leading candidates to incorrectly select Option A thinking it enforces deletion rather than preventing it.

Practice this question →

22

MCQeasy

A marketing team needs to run ad-hoc SQL queries on terabytes of clickstream data stored in Parquet files in Cloud Storage. They want a serverless solution with no cluster management and the ability to query external data without loading. Which service should they use?

A.Cloud SQL

B.AlloyDB

C.Dataproc with Spark SQL

D.BigQuery with external tables

AnswerD

BigQuery external tables let you query Parquet files in GCS without loading, serverless.

Why this answer

BigQuery with external tables allows querying data stored in Cloud Storage (including Parquet files) without loading it into BigQuery storage, providing a serverless, fully managed solution with no cluster management. This matches the requirement for ad-hoc SQL queries on terabytes of clickstream data in Parquet format, as BigQuery automatically scales compute and storage.

Exam trap

The trap here is that candidates may choose Dataproc with Spark SQL (Option C) because it can query Parquet files, but they overlook the 'serverless' and 'no cluster management' requirement, which BigQuery satisfies natively without any cluster provisioning.

How to eliminate wrong answers

Option A is wrong because Cloud SQL is a fully managed relational database for OLTP workloads, not designed for petabyte-scale analytical queries on external Parquet files, and requires data to be loaded into its storage. Option B is wrong because AlloyDB is a PostgreSQL-compatible database optimized for transactional and hybrid workloads, not a serverless query engine for external data in Cloud Storage, and it requires data to be imported. Option C is wrong because Dataproc with Spark SQL requires cluster management (even if ephemeral) and is not serverless; it also involves provisioning and scaling clusters, contradicting the 'no cluster management' requirement.

Practice this question →

23

Multi-Selectmedium

A company needs a fully managed, PostgreSQL-compatible database that supports both transactional (OLTP) and analytical (OLAP) workloads with low latency. They want to minimize operational overhead. Which two Google Cloud services should they consider? (Choose two.)

Select 2 answers

A.Cloud SQL for PostgreSQL

B.Cloud Spanner

C.AlloyDB with BigQuery as a federated source

D.BigQuery

E.AlloyDB

AnswersC, E

AlloyDB handles OLTP, and BigQuery can query it via federated queries for analytics, but the question asks for services to consider; AlloyDB alone may suffice, but combining with BigQuery adds analytics power.

Why this answer

AlloyDB is a fully managed PostgreSQL-compatible database service designed for both transactional (OLTP) and analytical (OLAP) workloads with low latency. By using BigQuery as a federated source, you can run analytical queries directly against AlloyDB data without moving it, combining operational and analytical capabilities while minimizing operational overhead.

Exam trap

Cisco often tests the misconception that a single database service must be either purely transactional or purely analytical, but the correct answer here leverages a combination of AlloyDB for OLTP and BigQuery federation for OLAP to meet both requirements with low operational overhead.

Practice this question →

24

MCQhard

A company is using BigQuery for analytics and needs to ensure that certain columns containing PII are encrypted with a customer-managed key (CMEK). Which approach should they take?

A.Use Cloud Data Loss Prevention (DLP) to mask the columns during query.

B.Use BigQuery column-level encryption with AEAD functions and a Cloud KMS key.

C.Apply CMEK at the dataset level; all tables inherit the encryption.

D.Store the data encrypted in Cloud Storage and use external tables with a CMEK.

AnswerB

Correct: AEAD functions enable column-level encryption with CMEK.

Why this answer

Option B is correct because BigQuery column-level encryption using AEAD (Authenticated Encryption with Associated Data) functions allows you to encrypt specific columns containing PII with a customer-managed key (CMEK) stored in Cloud KMS. This approach provides granular, field-level encryption that meets compliance requirements without affecting the rest of the table or dataset, and the encryption/decryption is performed transparently within BigQuery using the AEAD.DECRYPT_STRING function.

Exam trap

Cisco often tests the distinction between dataset-level encryption (CMEK at the dataset or table level) and column-level encryption; the trap here is that candidates assume CMEK applies only at the dataset level, missing that BigQuery supports field-level encryption via AEAD functions with Cloud KMS keys for granular control.

How to eliminate wrong answers

Option A is wrong because Cloud DLP masking is a data loss prevention technique that obscures data at query time but does not encrypt the underlying stored data with a CMEK; it is a transformation applied on the fly, not persistent encryption. Option C is wrong because CMEK at the dataset level encrypts the entire dataset's underlying storage (e.g., table files), but it does not provide column-level granularity; all columns are encrypted uniformly, and you cannot selectively encrypt only PII columns. Option D is wrong because storing data encrypted in Cloud Storage and using external tables with a CMEK would require managing encryption outside BigQuery and does not leverage BigQuery's native column-level encryption capabilities; external tables also have performance and feature limitations compared to native BigQuery tables.

Practice this question →

25

Multi-Selectmedium

A company is building a data lake on Cloud Storage. They need to organise data into zones for raw, curated, and processed layers. Which TWO practices should they follow? (Choose 2.)

Select 2 answers

A.Use the same storage class for all zones to simplify management.

B.Use separate Cloud Storage buckets for each zone (raw, curated, processed).

C.Set retention policies on the raw zone to make data immutable.

D.Enable object versioning on all buckets to prevent data loss.

E.Use a single bucket with different prefixes (e.g., /raw, /curated, /processed).

AnswersB, E

Separate buckets provide clear isolation and independent lifecycle management.

Why this answer

A data lake typically uses separate buckets or prefixes for raw (immutable), curated (cleaned/transformed), and processed (aggregated/reporting) data. Using prefixes within a single bucket is common, but separate buckets provide better isolation. Lifecycle rules can be applied per prefix or bucket.

Practice this question →

26

MCQhard

An organization wants to enforce that data in a Cloud Storage bucket cannot be deleted or overwritten for 7 years due to regulatory compliance. Which Cloud Storage feature should they use?

A.Retention Policy with Bucket Lock

B.IAM conditions

C.Object Lifecycle Management

D.Object holds

AnswerA

Retention Policy prevents deletion/overwrites; Bucket Lock makes it immutable.

Why this answer

Retention Policy with a retention period ensures objects cannot be deleted or overwritten during that period. Bucket Lock makes the policy permanent. Object holds are per-object.

Lifecycle management automates transitions/deletions, opposite of retention.

Practice this question →

27

MCQeasy

A data engineer needs to store transactional data for an e-commerce application that requires ACID compliance, automatic failover, and point-in-time recovery. The expected throughput is a few thousand transactions per second. Which Google Cloud storage option should they choose?

A.Firestore

B.Cloud Bigtable

C.Cloud SQL

D.Cloud Spanner

AnswerC

Cloud SQL provides ACID compliance, high availability, and PITR for moderate OLTP workloads.

Why this answer

Cloud SQL is the correct choice because it provides full ACID compliance, automated failover with high availability configurations, and point-in-time recovery via binary log replay. It supports up to several thousand transactions per second with appropriate machine sizing, making it suitable for this e-commerce workload.

Exam trap

The trap here is that candidates often choose Cloud Spanner for any ACID requirement, overlooking that Cloud SQL is the cost-effective and simpler choice for single-region transactional workloads with moderate throughput.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database that does not support ACID transactions across multiple documents in the same way as a relational database, and it lacks point-in-time recovery as a built-in feature. Option B is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for high-throughput analytical workloads (millions of ops/sec), not for ACID-compliant transactional workloads with point-in-time recovery. Option D is wrong because Cloud Spanner is a globally distributed relational database that provides ACID compliance and strong consistency, but it is overkill for a few thousand transactions per second and introduces unnecessary complexity and cost compared to Cloud SQL for this scale.

Practice this question →

28

MCQeasy

A company wants to run complex analytical queries on structured data without managing infrastructure. The data volume is terabytes and queries can take seconds to minutes. Which service is appropriate?

A.Firestore

B.Cloud Bigtable

C.Cloud SQL

D.BigQuery

AnswerD

Serverless data warehouse for analytics.

Why this answer

BigQuery is correct because it is a serverless, highly scalable data warehouse designed for running complex analytical queries on terabytes of data with fast query performance (seconds to minutes) without any infrastructure management. It uses a columnar storage format and a distributed query engine to handle large-scale structured data efficiently, making it ideal for this use case.

Exam trap

Cisco often tests the distinction between OLTP databases (Cloud SQL, Firestore, Bigtable) and OLAP/data warehouse services (BigQuery), where candidates mistakenly choose Cloud SQL for analytical workloads due to its SQL familiarity, ignoring its scalability and performance limitations for large-scale analytics.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database optimized for real-time mobile and web app data with low-latency reads/writes, not for complex analytical queries on terabytes of data. Option B is wrong because Cloud Bigtable is a wide-column NoSQL database designed for high-throughput, low-latency operational workloads (e.g., time-series, IoT), not for complex analytical queries that require SQL-like joins and aggregations. Option C is wrong because Cloud SQL is a managed relational database for OLTP workloads with limited scalability (up to ~30 TB), and running complex analytical queries on terabytes of data would cause performance bottlenecks and require manual sharding or read replicas.

Practice this question →

29

MCQmedium

A data engineer wants to create a data lake on Google Cloud for storing raw streaming data, then transform it into curated and processed zones for analytics. The data is in Avro format and will be queried by BigQuery. Which two services are MOST suitable as the primary storage and query interface?

A.Cloud Storage and BigQuery

B.Cloud Storage and Dataproc

C.Cloud Storage and Cloud SQL

D.Cloud Storage and Firestore

AnswerA

Cloud Storage stores the Avro files in zones, and BigQuery queries them via external tables or loaded tables.

Why this answer

Cloud Storage is the most suitable primary storage for a data lake because it provides scalable, durable, and cost-effective object storage for raw Avro data. BigQuery is the ideal query interface because it can directly query Avro files stored in Cloud Storage using external tables, and it supports serverless analytics without needing to manage infrastructure.

Exam trap

Cisco often tests the misconception that a processing engine like Dataproc is needed to query Avro data, when in fact BigQuery can natively query Avro files stored in Cloud Storage without any intermediate processing.

How to eliminate wrong answers

Option B is wrong because Dataproc is a managed Spark/Hadoop service for batch processing, not a primary query interface for ad-hoc analytics; it would add unnecessary complexity and latency compared to BigQuery's direct Avro querying. Option C is wrong because Cloud SQL is a relational database for transactional workloads, not designed for large-scale analytics on Avro data in a data lake, and it cannot directly query Avro files. Option D is wrong because Firestore is a NoSQL document database for real-time applications, not suitable for analytical queries on large volumes of streaming data in Avro format.

Practice this question →

30

Multi-Selecthard

A company stores sensitive data in BigQuery and Cloud Storage. They must encrypt data with customer-managed keys and restrict access to the encryption keys to only a specific VPC network. Which THREE components should they configure? (Choose 3)

Select 2 answers

A.VPC Service Controls

B.Cloud HSM

C.Cloud KMS with a customer-managed key

D.Identity-Aware Proxy (IAP)

E.Cloud Armor

AnswersA, C

VPC-SC creates a perimeter that restricts access to specified services (including KMS) based on network context.

Why this answer

VPC Service Controls (A) is correct because it allows you to define a security perimeter around BigQuery and Cloud Storage, restricting access to these services from only a specific VPC network. This ensures that even if an attacker obtains valid credentials, they cannot access the data from outside the authorized VPC, which is essential for limiting encryption key access to the designated network.

Exam trap

Cisco often tests the misconception that Cloud HSM alone provides network-level access control, when in fact it is a key management backend that must be combined with VPC Service Controls to restrict key access to a specific VPC network.

Practice this question →

31

MCQhard

A data engineer wants to create a BigQuery external table that queries data stored in Parquet format in Cloud Storage without loading the data into BigQuery. Which approach is correct?

A.Use Cloud SQL federated query to read Parquet from GCS

B.Create an external table definition using a JSON schema file referencing Cloud Storage URIs

C.Use the bq load command with --source_format=PARQUET

D.Set up BigLake to create a BigQuery external table

AnswerB

External table definitions point to GCS, query data in place.

Why this answer

Option B is correct because creating an external table definition with a JSON schema file referencing Cloud Storage URIs allows BigQuery to query Parquet data directly from Cloud Storage without loading it. BigQuery natively supports Parquet as an external data source, and the schema can be auto-detected or explicitly defined via a JSON file. This approach avoids data ingestion costs and keeps the data in its original location.

Exam trap

Cisco often tests the distinction between loading data (bq load) and creating external tables (bq mk --external_table_definition), where candidates mistakenly choose the load command because they think it can also create external references.

How to eliminate wrong answers

Option A is wrong because Cloud SQL federated query is designed for querying Cloud SQL databases, not for reading Parquet files from Cloud Storage; it does not support Parquet as a data source. Option C is wrong because the `bq load` command loads data into BigQuery tables, not creating external tables; it physically moves the data into BigQuery storage, which contradicts the requirement to query without loading. Option D is wrong because BigLake is a separate service for managing data lakes with fine-grained access control, but creating a BigQuery external table directly via the console or API (with a schema definition) is the standard method; BigLake is not required for this simple external table use case.

Practice this question →

32

MCQmedium

A company has a Cloud SQL for PostgreSQL instance that must be highly available across zones with automatic failover. They also need a read replica for reporting workloads. Which configuration should they use?

A.Enable high availability (regional) and create a read replica

B.Deploy Cloud SQL in multi-region mode

C.Enable automatic backups and point-in-time recovery

D.Create a cross-region replica and use it for failover

AnswerA

HA provides automatic failover across zones; read replicas handle reporting.

Why this answer

Option A is correct because Cloud SQL for PostgreSQL supports regional high availability (HA) with synchronous replication across two zones, ensuring automatic failover without data loss. Additionally, you can create a read replica in a different zone or region to offload reporting workloads without impacting the primary instance's performance. This combination meets both the high availability and read replica requirements.

Exam trap

The trap here is confusing high availability (automatic zonal failover) with disaster recovery (manual cross-region promotion), and assuming that a read replica can serve as a failover target without understanding the synchronous vs. asynchronous replication difference.

How to eliminate wrong answers

Option B is wrong because Cloud SQL does not support a 'multi-region mode'; it offers regional HA (zonal failover) and cross-region replicas, but not a multi-region deployment like Spanner. Option C is wrong because automatic backups and point-in-time recovery provide data protection and restore capabilities, but they do not provide automatic failover or a read replica for reporting. Option D is wrong because a cross-region replica can be promoted for failover, but it is not designed for automatic failover; promoting a cross-region replica is a manual process and does not provide the same synchronous replication and automatic failover as regional HA.

Practice this question →

33

MCQeasy

A team needs to store transactional data for an e-commerce application that requires ACID transactions, automatic backups, and point-in-time recovery. The expected workload is under 10,000 QPS. Which database should they choose?

A.Cloud Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerB

Correct choice: ACID, backups, PITR, fits OLTP under 10k QPS.

Why this answer

Cloud SQL is the correct choice because it provides fully managed relational databases (MySQL, PostgreSQL, SQL Server) with built-in ACID transaction support, automated backups, and point-in-time recovery (PITR) via binary logs or write-ahead logs. The workload of under 10,000 QPS is well within Cloud SQL's performance envelope, making it a cost-effective and operationally simple solution for transactional e-commerce data.

Exam trap

The trap here is that candidates often choose Cloud Spanner for any workload requiring ACID transactions and high availability, overlooking that Cloud SQL is sufficient and more cost-effective for sub-10,000 QPS workloads, and that Cloud Spanner's global distribution and strong consistency come with a significant price premium.

How to eliminate wrong answers

Option A is wrong because Cloud Bigtable is a NoSQL, wide-column database designed for high-throughput analytical workloads (millions of QPS) and does not support ACID transactions or SQL queries, making it unsuitable for transactional e-commerce data. Option C is wrong because Firestore is a NoSQL document database that, while supporting transactions, does not offer the full ACID guarantees across multiple documents in the same way as a relational database, and its automatic backup and PITR capabilities are limited compared to Cloud SQL. Option D is wrong because Cloud Spanner is a globally distributed, horizontally scalable relational database that supports ACID transactions and PITR, but it is overkill and significantly more expensive for a workload under 10,000 QPS, which can be handled more cost-effectively by Cloud SQL.

Practice this question →

34

MCQmedium

A company wants to store backups of on-premises databases in Google Cloud for long-term retention. They need WORM (Write Once, Read Many) compliance and object-level retention policies. What should they use?

A.Firestore backups

B.Cloud Storage with Object Lock

C.BigQuery table snapshots

D.Cloud Storage with retention policies

AnswerB

Object Lock provides per-object WORM retention, meeting compliance.

Why this answer

Cloud Storage with Object Lock provides WORM (Write Once, Read Many) compliance by preventing objects from being deleted or overwritten for a fixed retention period. It supports both retention policies (applied at the bucket level) and legal holds (applied at the object level), meeting the requirement for object-level retention policies. This makes it the correct choice for long-term backup retention with regulatory compliance needs.

Exam trap

Cisco often tests the distinction between bucket-level retention policies (Option D) and object-level retention policies (Object Lock), leading candidates to choose retention policies when the question explicitly requires object-level control.

How to eliminate wrong answers

Option A is wrong because Firestore backups are designed for Firestore databases and do not support WORM compliance or object-level retention policies; they are intended for point-in-time recovery of Firestore data, not for long-term archival with immutable storage. Option C is wrong because BigQuery table snapshots are used for preserving table data at a specific point in time for querying or recovery, but they do not provide WORM compliance or object-level retention policies; they are not a storage service for backup files. Option D is wrong because Cloud Storage with retention policies applies a uniform retention period to all objects in a bucket, but it does not support object-level retention policies; Object Lock is required for granular, per-object retention settings and legal holds.

Practice this question →

35

MCQhard

A company needs to store logs in Cloud Storage for compliance, with a requirement that logs cannot be deleted or overwritten for a period of 7 years. Which Cloud Storage feature should they enable?

A.Bucket Lock with a retention policy of 7 years.

B.Requester Pays bucket setting.

C.Versioning enabled with Object Hold.

D.Object lifecycle management with a delete rule after 7 years.

AnswerA

Bucket Lock enforces a retention policy that prevents object deletion or overwrite for the specified period.

Why this answer

Bucket Lock with a retention policy of 7 years is the correct feature because it enforces a WORM (Write Once, Read Many) model on the bucket. Once a retention policy is locked, objects cannot be deleted or overwritten until the retention period expires, meeting the compliance requirement for immutable log storage.

Exam trap

Cisco often tests the distinction between features that prevent deletion (Bucket Lock) versus features that only track versions or automate cleanup (Versioning, Lifecycle), leading candidates to choose Versioning or Lifecycle rules thinking they enforce immutability.

How to eliminate wrong answers

Option B is wrong because Requester Pays shifts storage costs to the requester but does not prevent deletion or overwriting of objects. Option C is wrong because Versioning enabled with Object Hold can prevent deletion of specific object versions but does not prevent overwriting of the current version, and holds are not a bucket-wide immutable policy. Option D is wrong because Object lifecycle management with a delete rule only automates deletion after a set time but does not prevent manual deletion or overwriting before that time, so it cannot enforce a non-deletion guarantee.

Practice this question →

36

MCQeasy

Which Google Cloud service is a fully managed relational database for MySQL, PostgreSQL, and SQL Server, offering automatic replication and backups?

A.Cloud Spanner

B.AlloyDB

C.Bigtable

D.Cloud SQL

AnswerD

Correct: Cloud SQL supports MySQL, PostgreSQL, and SQL Server with automatic backups and replication.

Why this answer

Cloud SQL is the correct answer because it is Google Cloud's fully managed relational database service that supports MySQL, PostgreSQL, and SQL Server. It provides automatic replication across zones and automated backups, making it the ideal choice for traditional relational database workloads without the need for manual administration.

Exam trap

The trap here is that candidates often confuse Cloud SQL with Cloud Spanner because both are relational databases, but Cloud Spanner is designed for global scale and does not support MySQL, PostgreSQL, or SQL Server compatibility.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, horizontally scalable relational database service that supports strong consistency and SQL, but it is not a fully managed service for MySQL, PostgreSQL, or SQL Server; it uses its own proprietary SQL dialect and is designed for sharded, multi-region deployments. Option B is wrong because AlloyDB is a fully managed PostgreSQL-compatible database service optimized for high performance and transactional workloads, but it does not support MySQL or SQL Server. Option C is wrong because Bigtable is a fully managed, scalable NoSQL wide-column database service, not a relational database, and it does not support MySQL, PostgreSQL, or SQL Server.

Practice this question →

37

MCQmedium

A company wants to build a data lake on Cloud Storage for raw, curated, and processed data zones. They need to enforce data governance including column-level security and row-level filtering for BigQuery queries. Which solution should they use?

A.BigLake tables over Cloud Storage

B.BigQuery external tables reading from GCS

C.Dataproc with Spark SQL

D.Cloud Storage with IAM and VPC Service Controls

AnswerA

BigLake provides fine-grained access control (column-level and row-level security) via BigQuery, along with a unified lakehouse.

Why this answer

BigLake tables provide a unified governance layer over Cloud Storage data, enabling fine-grained access control such as column-level security and row-level filtering directly on BigQuery queries. This is achieved by integrating BigQuery's access control policies with the external data stored in GCS, without needing to move data into BigQuery native storage. The other options either lack these granular security features or require complex workarounds.

Exam trap

Cisco often tests the misconception that BigQuery external tables (Option B) can support the same fine-grained security as BigLake, but they cannot because external tables lack the integrated policy engine for column and row-level controls.

How to eliminate wrong answers

Option B is wrong because BigQuery external tables reading from GCS only support table-level IAM permissions and cannot enforce column-level security or row-level filtering; they treat the external data as a flat file without fine-grained access controls. Option C is wrong because Dataproc with Spark SQL does not natively provide column-level or row-level security on Cloud Storage data; it requires manual implementation via Spark's security APIs and does not integrate with BigQuery's governance model. Option D is wrong because Cloud Storage with IAM and VPC Service Controls only provides bucket- and object-level access control and network perimeter security, but cannot enforce column-level or row-level filtering on queries executed in BigQuery.

Practice this question →

38

MCQeasy

A mobile app needs a NoSQL database that supports offline synchronization when the device goes offline and later reconnects. Which Google Cloud database should be used?

A.Firestore

B.Cloud Bigtable

C.Cloud Spanner

D.Cloud SQL

AnswerA

Document NoSQL with offline sync.

Why this answer

Firestore provides offline persistence for mobile and web apps. It caches data locally and syncs when online. Cloud SQL and Spanner are relational; Bigtable does not have offline sync.

Practice this question →

39

MCQmedium

A company uses BigQuery for analytics and needs to ensure that certain columns containing PII are encrypted at query time so that only authorized users can decrypt. What should they use?

A.BigQuery AEAD encryption functions

B.VPC Service Controls

C.Customer-managed encryption keys (CMEK)

D.Fine-grained IAM roles

AnswerA

AEAD encrypts columns; access control via key access.

Why this answer

BigQuery AEAD encryption functions allow you to encrypt sensitive columns (e.g., PII) at query time using a user-managed key, so that only authorized users who possess the key can decrypt the data. This is the correct approach because it provides column-level, application-layer encryption that is transparent to the query engine and ensures that unauthorized users see only ciphertext.

Exam trap

Cisco often tests the distinction between encryption at rest (CMEK) and encryption at query time (AEAD), so the trap here is assuming that CMEK provides column-level, user-specific decryption control, when in fact it only protects the underlying storage.

How to eliminate wrong answers

Option B is wrong because VPC Service Controls provide network-level security boundaries to prevent data exfiltration, not column-level encryption at query time. Option C is wrong because Customer-managed encryption keys (CMEK) encrypt data at rest (storage layer), not at query time, and do not control per-user decryption access. Option D is wrong because Fine-grained IAM roles control access to tables or rows via row-level security, but they do not encrypt the data itself; authorized users still see plaintext PII.

Practice this question →

40

MCQeasy

A startup is building a mobile app that needs to sync user data across devices in real time. They expect millions of concurrent users and need a NoSQL database with offline support and automatic multi-region replication. Which Google Cloud service meets these requirements?

A.Cloud Bigtable

B.Cloud Spanner

C.Firestore

D.Cloud SQL

AnswerC

Firestore offers real-time listeners, offline persistence, and automatic multi-region replication, ideal for mobile sync.

Why this answer

Firestore is a NoSQL, serverless document database that provides real-time synchronization, offline support via local persistence, and automatic multi-region replication. It is designed for mobile and web apps with millions of concurrent users, making it the ideal choice for this use case.

Exam trap

The trap here is that candidates often confuse Cloud Spanner's global SQL capabilities with NoSQL requirements, or assume Cloud Bigtable's NoSQL label fits all NoSQL workloads, ignoring the specific need for real-time sync and offline support.

How to eliminate wrong answers

Option A is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for high-throughput analytical workloads (e.g., time-series, IoT), not for real-time sync or offline mobile app support, and it lacks built-in multi-region replication. Option B is wrong because Cloud Spanner is a globally distributed, strongly consistent relational SQL database, not a NoSQL database, and while it supports multi-region replication, it does not provide offline support for mobile clients. Option D is wrong because Cloud SQL is a managed relational SQL database (MySQL, PostgreSQL, SQL Server) that is not NoSQL, does not support offline mobile sync, and requires manual configuration for multi-region replication.

Practice this question →

41

Multi-Selectmedium

A team is designing a Spanner database for a global inventory system. They need to optimize query performance for frequently joined tables. Which THREE design decisions help achieve this? (Choose 3.)

Select 3 answers

A.Use Cloud SQL instead if joins are needed.

B.Design primary keys to distribute write load evenly across splits.

C.Use interleaved tables to co-locate related rows.

D.Store all data in a single table with JSON columns to avoid joins.

E.Create secondary indexes on columns used in WHERE clauses.

AnswersB, C, E

Good primary key design prevents hotspots and ensures scalability.

Why this answer

Option B is correct because Spanner uses distributed splits for scalability, and designing primary keys to avoid hotspots (e.g., using hash prefixes or monotonically increasing keys) ensures write load is evenly distributed across splits, preventing performance bottlenecks.

Exam trap

Cisco often tests the misconception that avoiding joins entirely (Option D) is a better optimization than properly using Spanner's native features like interleaving and secondary indexes, which are designed to handle joins efficiently at scale.

Practice this question →

42

MCQhard

A company stores highly sensitive financial data in BigQuery. They need to encrypt certain columns (e.g., credit card numbers) with customer-managed encryption keys (CMEK) at the column level. Which BigQuery feature should they use?

A.Customer-managed encryption keys (CMEK) on the dataset

B.VPC Service Controls

C.AEAD encryption functions with Cloud KMS

D.BigQuery Data Catalog with policy tags

AnswerC

AEAD functions allow column-level encryption using keys from Cloud KMS, including CMEK.

Why this answer

Option C is correct because BigQuery's AEAD (Authenticated Encryption with Associated Data) encryption functions, when used with Cloud KMS, allow you to encrypt individual columns (e.g., credit card numbers) using customer-managed encryption keys (CMEK) at the column level. This provides granular, application-layer encryption where the key is managed by the customer, not Google, and decryption is performed within SQL queries using the AEAD.DECRYPT_STRING function. Dataset-level CMEK (Option A) encrypts all data at rest in the dataset but cannot target specific columns.

Exam trap

Cisco often tests the distinction between dataset-level CMEK (encrypts all data at rest) and column-level AEAD encryption (encrypts specific fields with customer keys), leading candidates to mistakenly choose dataset CMEK when the requirement is for column-level granularity.

How to eliminate wrong answers

Option A is wrong because CMEK on a dataset encrypts all data at rest in that dataset, not at the column level; it cannot selectively encrypt specific columns like credit card numbers. Option B is wrong because VPC Service Controls provide network security boundaries to prevent data exfiltration, not column-level encryption of data within BigQuery. Option D is wrong because BigQuery Data Catalog with policy tags is used for fine-grained access control and data classification (e.g., masking or row-level security), not for encrypting column data with customer-managed keys.

Practice this question →

43

Multi-Selecthard

A company uses BigQuery for analytics on petabyte-scale data. They want to improve query performance by denormalizing schemas and reducing joins. Which TWO BigQuery features should they use? (Choose 2)

Select 1 answer

A.Clustering on frequently filtered columns

B.Using subqueries instead of JOINs

C.External tables reading from Cloud Storage

D.Table partitioning by date

E.Nested and repeated fields (ARRAY<STRUCT<...>>)

AnswersE

These allow storing related data in a single row, reducing the need for joins.

Why this answer

Option E is correct because BigQuery natively supports nested and repeated fields (ARRAY<STRUCT<...>>) to represent one-to-many relationships within a single row, enabling denormalized schemas that eliminate the need for expensive JOIN operations. This reduces data shuffling and improves query performance on petabyte-scale data by allowing all related data to be stored and scanned together.

Exam trap

Cisco often tests the misconception that clustering or partitioning alone can replace denormalization, but these are physical optimizations that do not change the schema structure or eliminate the need for joins.

Practice this question →

44

MCQmedium

A company is designing a data lake on Cloud Storage with three zones: raw, curated, and processed. They need to enforce data governance by restricting access to each zone using IAM. Which approach should they take?

A.Create a single bucket with folders for each zone, and use IAM conditions to restrict access

B.Use Cloud Storage lifecycle rules to move objects between zones

C.Use a single bucket and rely on object ACLs

D.Create three separate buckets, one per zone, and assign IAM roles per bucket

AnswerD

Separate buckets provide clear isolation and simpler IAM management.

Why this answer

Option D is correct because Cloud Storage buckets are the fundamental access boundary for IAM policies. By creating three separate buckets (raw, curated, processed), you can assign distinct IAM roles (e.g., roles/storage.objectViewer, roles/storage.objectAdmin) per bucket, ensuring that users or service accounts only have access to the specific zone they are authorized for. This approach aligns with the principle of least privilege and avoids the complexity and limitations of IAM conditions or object-level ACLs.

Exam trap

Cisco often tests the misconception that folders within a single bucket can serve as effective security boundaries, but in Cloud Storage, folders are just a naming convention (prefixes) and do not provide native access control isolation without complex IAM conditions or ACLs.

How to eliminate wrong answers

Option A is wrong because IAM conditions on a single bucket with folders can restrict access based on object name prefixes, but they are complex to manage, prone to misconfiguration, and do not provide the same clear security boundary as separate buckets; also, IAM conditions are not supported for all roles and can lead to unintended access if not carefully crafted. Option B is wrong because Cloud Storage lifecycle rules are used for automating object transitions (e.g., moving to Nearline or deleting) based on age or other conditions, not for enforcing access control or governance between zones. Option C is wrong because relying on object ACLs is a legacy approach that is harder to audit and maintain at scale; ACLs provide per-object permissions but do not offer the centralized, hierarchical control of IAM roles, and they are not recommended for data lake architectures where consistent governance is required.

Practice this question →

45

MCQhard

A company uses Cloud Spanner for a global e-commerce platform. They have a table of orders and a table of order items. To optimize performance for queries that join these tables on order_id, which Spanner schema design feature should they use?

A.Use Cloud Bigtable instead

B.Create a secondary index on order_id

C.Denormalize the order items into the orders table using repeated fields

D.Use interleaved tables with order_items as a child table of orders

AnswerD

Interleaving co-locates rows, improving join performance.

Why this answer

Interleaved tables store child rows physically with parent rows, reducing join latency. Secondary indexes are for filtering. Partitioned tables not in Spanner.

Denormalization could help but interleaved tables are the designed approach.

Practice this question →

46

MCQmedium

A healthcare company must encrypt data in BigQuery with customer-managed keys (CMEK). They want to control the key lifecycle independently. Which approach should they take?

A.Use BigQuery column-level encryption (AEAD) with a key from Cloud KMS

B.Use Cloud KMS to create a key, then set it as the default encryption key for the BigQuery dataset

C.Enable default encryption on the Cloud Storage bucket used for staging data

D.Encrypt the data before loading using a custom application and store the key in Secret Manager

AnswerB

BigQuery CMEK is configured at the dataset level via Cloud KMS.

Why this answer

BigQuery supports CMEK through Cloud KMS. You can create a key ring and key in Cloud KMS, then specify it when creating datasets or tables. The key is used to encrypt data at rest.

Column-level encryption is separate and uses AEAD functions with customer-managed keys at the application level.

Practice this question →

47

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.Firestore

B.Cloud Spanner

C.Cloud Bigtable

D.BigQuery

AnswerC

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, handling petabytes of data with consistent single-digit millisecond latency for high-throughput reads and writes. Its key-value model with timestamp-based versioning is ideal for time-series IoT sensor data, and it supports millions of reads per second via its HBase API and Bigtable's underlying tablet-based architecture.

Exam trap

Cisco often tests the distinction between databases optimized for real-time key-value access (Bigtable) versus those for transactional consistency (Spanner) or analytical queries (BigQuery), and the trap here is assuming that a relational database like Spanner is needed for any structured data, ignoring the specific throughput and latency requirements of high-scale time-series workloads.

How to eliminate wrong answers

Option A is wrong because Firestore is a document-oriented NoSQL database optimized for mobile and web app real-time sync, not for petabyte-scale time-series workloads with millions of reads per second; it has throughput limits (e.g., 10,000 writes/second per database) and does not natively handle high-throughput time-series data. Option B is wrong because Cloud Spanner is a globally distributed relational database with strong consistency and SQL support, but it is designed for transactional workloads (OLTP) with moderate throughput, not for petabyte-scale key-value time-series data at millions of reads per second; its latency and cost profile are not optimal for this use case. Option D is wrong because BigQuery is a serverless data warehouse for analytical SQL queries on large datasets, but it is not designed for single-digit millisecond latency at millions of reads per second; it is optimized for batch and interactive analytics, not real-time key-value lookups.

Practice this question →

48

MCQmedium

A data engineer needs to design a schema in BigQuery for a dataset that contains customer orders. Each order has a header and multiple line items. Queries frequently need to retrieve the entire order including line items. Which schema design is MOST performant and cost-effective?

A.Store all data in a flat table with repeated order info per line item

B.Use nested and repeated fields (orders table with line items as REPEATED RECORD)

C.Normalize into separate orders and line_items tables, join on order_id

D.Use a partitioned table on order date

AnswerB

Nested/repeated fields allow storing order with line items in one row, eliminating joins.

Why this answer

Option B is correct because BigQuery is optimized for denormalized schemas using nested and repeated fields (REPEATED RECORD). Storing line items as a repeated record within the orders table avoids expensive JOIN operations, reduces data shuffling, and allows BigQuery to scan only the necessary columns, making queries that retrieve entire orders with line items both faster and more cost-effective.

Exam trap

Cisco often tests the misconception that normalization (Option C) is always the best practice for relational databases, but in BigQuery's distributed, columnar architecture, denormalization with nested and repeated fields is the recommended pattern for performance and cost efficiency.

How to eliminate wrong answers

Option A is wrong because storing all data in a flat table with repeated order info per line item leads to massive data duplication (each line item repeats all order header fields), increasing storage costs and query scan size without leveraging BigQuery's native nested structure. Option C is wrong because normalizing into separate orders and line_items tables and joining on order_id introduces expensive JOIN operations that require shuffling and sorting large datasets, which is inefficient in BigQuery's distributed architecture and incurs higher slot usage and cost. Option D is wrong because partitioning on order date alone does not address the structural inefficiency of storing line items separately; while partitioning can improve query performance for date-range filters, it does not eliminate the need for JOINs or duplication, and the question specifically asks about retrieving entire orders with line items.

Practice this question →

49

MCQmedium

A data engineer is designing a BigQuery table for a clickstream dataset with frequent queries aggregating over user sessions. Each user session has multiple events, and the engineer wants to avoid joins for performance. Which schema design pattern should they use?

A.Use a normalized schema with separate tables for sessions and events, then join on session ID

B.Store each event as a separate row with session key and use clustering on session ID

C.Use partitioning on event timestamp and clustering on user ID

D.Use nested and repeated fields to store events within each session row

AnswerD

Nested repeated fields allow storing events inside the session row, avoiding joins.

Why this answer

BigQuery supports nested and repeated fields (e.g., STRUCT and REPEATED), allowing denormalization. This reduces the need for joins and improves query performance. Partitioning and clustering are for physical data organization, not schema design.

Practice this question →

50

Multi-Selectmedium

A company uses BigQuery partitioned tables with daily partitions for log data. They want to automatically delete partitions older than 90 days and ensure that current month data is in a specific dataset with a set expiration. Which THREE actions should they take? (Choose 3)

Select 3 answers

A.Set partition expiration to 90 days on the table

B.Cluster the table on a timestamp column

C.Create the table as a partitioned table by ingestion time

D.Set the table's default partition expiration to 90 days

E.Set a lifecycle management rule on Cloud Storage to delete objects older than 90 days

AnswersA, C, D

Partition expiration automatically drops partitions older than 90 days.

Why this answer

Setting partition expiration on the table deletes old partitions. Setting a table-level default expiration ensures new partitions inherit the expiration. Using a partitioned table by date is necessary for partition expiration to work.

Option B (gcs lifecycle) is for storage, not BigQuery. Option C (clustering) helps query performance but does not automate deletion.

Practice this question →

51

MCQeasy

An application needs to store user profile data in a document database with flexible schema. The data is accessed frequently from a mobile app. Which Google Cloud database is BEST suited?

A.Cloud Bigtable

B.BigQuery

C.Cloud Spanner

D.Cloud Firestore

AnswerD

Firestore stores JSON documents, ideal for flexible schema mobile apps.

Why this answer

Cloud Firestore is a NoSQL document database designed for mobile and web app development, offering flexible schema, real-time data synchronization, and automatic scaling. It directly supports frequent reads from mobile apps through its client SDKs and offline persistence, making it the best fit for storing user profile data with varying attributes.

Exam trap

The trap here is that candidates often confuse Cloud Firestore with Cloud Bigtable because both are NoSQL, but Bigtable lacks document flexibility, real-time sync, and mobile SDK support, which are essential for the described use case.

How to eliminate wrong answers

Option A is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for high-throughput analytical and operational workloads (e.g., time-series, IoT), not for flexible document storage or mobile app real-time access. Option B is wrong because BigQuery is a serverless data warehouse for running SQL-based analytics on large datasets, not a transactional database for user profile reads/writes. Option C is wrong because Cloud Spanner is a globally distributed relational database with strong consistency and SQL support, but its rigid schema and higher latency for simple document operations make it overkill and less suitable for flexible schema mobile app data.

Practice this question →

52

MCQmedium

A mobile app uses Firestore to store user profiles. The app allows offline data creation and syncing when connectivity resumes. Which Firestore feature should the developer enable?

A.Set up a Firestore trigger to cache data in Cloud Memorystore

B.Enable offline persistence in the Firestore client SDK

C.Use Cloud Storage signed URLs for offline access

D.Enable Firestore multi-region replication

AnswerB

Offline persistence allows local reads/writes and later syncs.

Why this answer

Firestore's offline persistence feature allows the client SDK to automatically cache data locally on the device. When the app creates or modifies data while offline, the SDK stores the changes in a local queue and syncs them with the Firestore backend once connectivity is restored. This is the correct and built-in mechanism for offline data creation and syncing.

Exam trap

Cisco often tests the distinction between server-side caching/replication features and client-side offline capabilities, leading candidates to confuse Firestore's built-in offline persistence with unrelated services like Cloud Memorystore or Cloud Storage.

How to eliminate wrong answers

Option A is wrong because Cloud Memorystore is a managed Redis or Memcached service for caching in server-side applications, not a client-side offline cache; Firestore triggers are server-side functions that cannot cache data in Memorystore for offline client access. Option C is wrong because Cloud Storage signed URLs provide temporary, authenticated access to objects in Cloud Storage, not to Firestore documents, and they are used for online access, not offline data creation and syncing. Option D is wrong because multi-region replication improves availability and durability for Firestore databases but does not enable client-side offline caching or queuing of writes.

Practice this question →

53

Multi-Selecthard

A multinational corporation needs a globally distributed database that supports strong consistency, SQL queries, and automatic failover across regions. They also want to optimize join performance for parent-child relationships. Which TWO features of Cloud Spanner should they use?

Select 2 answers

A.Strong consistency and automatic failover across regions

B.Secondary indexes

C.Bigtable as a caching layer

D.Read replicas for global distribution

E.Interleaved tables

AnswersA, E

Spanner's core features: globally consistent and automatic failover.

Why this answer

Cloud Spanner provides strong consistency and automatic failover across regions as core features. Strong consistency ensures that all reads return the most recent write, which is critical for globally distributed databases that require ACID transactions. Automatic failover across regions is built into Spanner's architecture using synchronous replication and Paxos-based consensus, enabling high availability without manual intervention.

Exam trap

Cisco often tests the distinction between secondary indexes and interleaved tables, where candidates mistakenly believe secondary indexes optimize parent-child joins, but interleaved tables are the correct feature for physical co-location and join performance.

Practice this question →

54

MCQhard

A data engineer is designing a Bigtable row key for a time-series dataset where each row represents a sensor reading. The team expects high write throughput and wants to avoid hotspots. Which row key design is BEST?

A.Sensor ID followed by reversed timestamp

B.Only sensor ID

C.Timestamp followed by sensor ID

D.Random salt appended to timestamp

AnswerA

Spreading writes across sensor IDs and using reversed timestamp avoids hotspots and enables efficient scans per sensor.

Why this answer

Option A is correct because prepending the sensor ID distributes writes across multiple tablet servers, while reversing the timestamp ensures that recent data (which is written most frequently) is spread across different rows rather than creating a hotspot at the end of the table. This design leverages Bigtable's lexicographic ordering to balance write load.

Exam trap

Cisco often tests the misconception that a random salt alone can solve hotspots, but the trap here is that the salt must be placed at the beginning of the row key to actually distribute writes; appending it after a timestamp does not prevent the timestamp from creating a sequential write pattern.

How to eliminate wrong answers

Option B is wrong because using only the sensor ID would cause all writes for a given sensor to hit a single row, creating a severe hotspot and throttling write throughput. Option C is wrong because a timestamp-first design means all new writes land at the end of the table (the 'hot tail' problem), overwhelming a single tablet server. Option D is wrong because appending a random salt to the timestamp still leaves the timestamp as the leading part of the key, so writes still concentrate at the current time range, and the random salt does not effectively distribute the load across the key space.

Practice this question →

55

MCQmedium

You need to store and query a large dataset of customer profiles. The data is semi-structured and frequently updated. The application requires offline support for mobile users. Which database is MOST appropriate?

A.Firestore

B.BigQuery

C.Cloud Bigtable

D.Cloud SQL

AnswerA

Correct: Firestore provides offline support and works with semi-structured documents.

Why this answer

Firestore is the most appropriate choice because it is a NoSQL document database designed for semi-structured data, real-time synchronization, and offline support. It provides built-in offline persistence for mobile clients, allowing users to read and write data even without network connectivity, and automatically syncs changes when the connection is restored. This directly meets the requirements of semi-structured data, frequent updates, and offline mobile support.

Exam trap

Cisco often tests the distinction between databases designed for transactional/operational workloads (like Firestore) versus analytical/warehouse databases (like BigQuery), and the trap here is assuming that any NoSQL database (like Bigtable) supports offline mobile sync, when in fact only Firestore provides native offline persistence and real-time synchronization for mobile clients.

How to eliminate wrong answers

Option B (BigQuery) is wrong because it is a serverless data warehouse optimized for analytical queries on large datasets, not for transactional or real-time updates, and it lacks native offline support for mobile applications. Option C (Cloud Bigtable) is wrong because it is a wide-column NoSQL database designed for high-throughput, low-latency workloads like time-series or IoT data, but it does not support offline mobile synchronization or semi-structured document models. Option D (Cloud SQL) is wrong because it is a relational database (MySQL, PostgreSQL, SQL Server) requiring a fixed schema, which is unsuitable for semi-structured data, and it does not provide built-in offline support for mobile clients.

Practice this question →

56

Multi-Selecthard

A global fintech company needs a database that can serve transactional (OLTP) and analytical (OLAP) workloads with strong consistency. They require high availability and PostgreSQL compatibility. Which TWO Google Cloud databases meet these requirements? (Choose 2 correct options)

Select 2 answers

A.Cloud SQL

B.Cloud Bigtable

C.Cloud Spanner

D.BigQuery

E.AlloyDB

AnswersC, E

Strong consistency, globally distributed, OLTP+analytics via SQL, PostgreSQL interface available.

Why this answer

Cloud Spanner is correct because it provides a globally distributed, strongly consistent relational database service that supports both OLTP and OLAP workloads via PostgreSQL-compatible interfaces (including the PostgreSQL dialect). It offers high availability through synchronous replication across zones and regions, and its TrueTime-based atomic clocks ensure external consistency for transactions, meeting the fintech company's requirements for strong consistency and PostgreSQL compatibility.

Exam trap

Cisco often tests the misconception that Cloud SQL or BigQuery can serve both OLTP and OLAP workloads with strong consistency, but Cloud SQL lacks global scaling and OLAP performance, and BigQuery is purely analytical without transactional support.

Practice this question →

57

MCQeasy

A data engineer wants to store archived log files in Cloud Storage with a retention policy that prevents deletion for 5 years. Which feature should they use?

A.Object Lifecycle rule with Delete action after 5 years

B.Retention Policy on the bucket set to 5 years

C.Object Hold (temporal)

D.Versioning enabled

AnswerB

Retention policies ensure objects cannot be deleted or replaced until the retention period expires.

Why this answer

A retention policy on a Cloud Storage bucket enforces a minimum retention period for all objects in the bucket, preventing deletion or overwrite until the policy duration has elapsed. Setting it to 5 years ensures that archived log files cannot be deleted before that time, meeting the data engineer's requirement exactly. This is a bucket-level, immutable setting that applies to all objects, unlike object-level holds or lifecycle rules.

Exam trap

Cisco often tests the distinction between lifecycle rules that delete objects and retention policies that prevent deletion, so the trap here is assuming that a lifecycle rule with a Delete action can enforce a retention period, when in fact it does the opposite.

How to eliminate wrong answers

Option A is wrong because an Object Lifecycle rule with a Delete action after 5 years would automatically delete objects after 5 years, which is the opposite of preventing deletion; it does not enforce a retention period. Option C is wrong because an Object Hold (temporal) is a temporary hold placed on individual objects for a specific duration (e.g., days), not a bucket-wide policy for 5 years, and it is typically used for legal or compliance holds, not long-term retention. Option D is wrong because Versioning enabled preserves previous versions of objects but does not prevent deletion of the current version; it allows recovery after deletion but does not block deletion itself, so it does not enforce a retention policy.

Practice this question →

58

MCQmedium

A team needs to run analytics on data stored in Cloud Storage (Parquet format) without moving it into BigQuery storage. They want to use SQL queries and BigQuery features like caching and partitioning. Which approach should they use?

A.Create a BigQuery external table pointing to the Cloud Storage location.

B.Use Cloud Dataproc to run Spark SQL on the data.

C.Load the data into BigQuery tables using a batch load job.

D.Use BigLake tables with Cloud Storage.

AnswerA

External tables allow querying data in GCS without loading. Note that partitioning and clustering are not available for external tables.

Why this answer

Option A is correct because an external table in BigQuery allows you to query data stored in Cloud Storage (Parquet format) using standard SQL without moving the data into BigQuery's managed storage. This approach supports BigQuery features like caching (results caching) and partitioning (by defining a Hive-partitioned external table), meeting all requirements.

Exam trap

Cisco often tests the distinction between external tables (which query data in place) and BigLake tables (which add security and governance layers but are not required for basic SQL queries with caching and partitioning), leading candidates to overcomplicate the solution by choosing BigLake when a simple external table suffices.

How to eliminate wrong answers

Option B is wrong because Cloud Dataproc with Spark SQL requires spinning up a cluster and does not leverage BigQuery's native SQL engine, caching, or partitioning features; it also moves compute to the data but not the query engine. Option C is wrong because loading data into BigQuery tables moves the data into BigQuery storage, which violates the requirement to not move the data. Option D is wrong because BigLake tables are a separate concept that unifies data lakes and warehouses but still requires creating a BigLake connection and does not directly provide BigQuery's caching and partitioning features for external data without additional configuration; the simpler external table approach is the correct fit.

Practice this question →

59

Multi-Selecthard

A company uses Cloud Storage to store sensitive customer data. They need to restrict access to the data so that only requests from within a specific VPC network are allowed, and block all access from the public internet. Which TWO configurations should they implement? (Choose 2.)

Select 2 answers

A.Enable Private Google Access on the VPC subnet.

B.Use Cloud Storage signed URLs for all access.

C.Disable public internet access by turning off the default internet gateway.

D.Use IAM conditions to restrict access based on VPC network.

E.Use VPC Service Controls to create a service perimeter around Cloud Storage.

AnswersD, E

IAM conditions can restrict access to requests originating from a specific VPC network.

Why this answer

Option D is correct because IAM conditions can be used to restrict access to Cloud Storage based on the requester's VPC network, ensuring only requests originating from the specified VPC are allowed. Option E is correct because VPC Service Controls create a service perimeter that prevents data exfiltration and blocks access from outside the perimeter, effectively denying public internet requests.

Exam trap

Cisco often tests the misconception that disabling the internet gateway or using Private Google Access alone can block public access to Cloud Storage, when in fact these controls affect connectivity from within the VPC, not inbound requests to Google-managed services.

Practice this question →

60

MCQeasy

Which Google Cloud database offers global distribution, strong consistency, and a 99.999% SLA?

A.Cloud Spanner

B.Cloud Bigtable

C.Firestore

D.Cloud SQL

AnswerA

Correct: Spanner offers global distribution, strong consistency, and 99.999% SLA.

Why this answer

Cloud Spanner is the only Google Cloud database that provides global distribution (horizontally scaling across regions), strong consistency (external consistency with TrueTime), and a 99.999% SLA. It combines the benefits of relational database structure with non-relational horizontal scale, making it ideal for globally distributed, strongly consistent workloads.

Exam trap

Cisco often tests the distinction between strong consistency and eventual consistency in globally distributed databases, and the trap here is that candidates may confuse Firestore's multi-region eventual consistency with the strong consistency required for the 99.999% SLA, or assume Cloud Bigtable's high throughput implies strong consistency.

How to eliminate wrong answers

Option B (Cloud Bigtable) is wrong because it offers only eventual consistency (not strong consistency) and a 99.99% SLA, not 99.999%. Option C (Firestore) is wrong because it provides strong consistency only within a single region; its multi-region mode uses eventual consistency, and its SLA is 99.999% only for single-region, not globally distributed strong consistency. Option D (Cloud SQL) is wrong because it is a single-region relational database with no global distribution capability and a 99.95% SLA.

Practice this question →

61

MCQmedium

A data engineer needs to store raw sensor data in Cloud Storage and automatically transition it to a lower-cost storage class after 30 days, then delete it after 365 days. What should they configure?

A.Use Cloud Pub/Sub notifications to trigger a Cloud Function that moves objects.

B.Use gsutil rewrite command in a cron job.

C.Configure a lifecycle rule with SetStorageClass to Nearline after 30 days and Delete after 365 days.

D.Set a bucket retention policy with a retention period of 365 days.

AnswerC

Lifecycle rules can automatically transition objects to a different storage class and then delete them based on age.

Why this answer

Option C is correct because Cloud Storage lifecycle management rules allow you to automatically transition objects to a lower-cost storage class (such as Nearline) after a specified number of days and then delete them after another period. This is the native, serverless way to manage object lifecycle without external scripts or compute resources.

Exam trap

Cisco often tests the distinction between lifecycle management (which automates transitions and deletions) and retention policies (which only prevent deletion/overwrites), leading candidates to confuse the two.

How to eliminate wrong answers

Option A is wrong because Cloud Pub/Sub notifications and Cloud Functions introduce unnecessary complexity and cost; lifecycle rules handle this natively without custom code. Option B is wrong because using gsutil rewrite in a cron job is a manual, error-prone approach that does not scale and incurs additional egress/operation costs; lifecycle rules are the intended automated solution. Option D is wrong because a bucket retention policy prevents deletion before the retention period ends, but it does not automatically transition objects to a lower-cost storage class; it only enforces immutability.

Practice this question →

62

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.Cloud Bigtable

B.BigQuery

C.Cloud Spanner

D.Firestore

AnswerA

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, handling petabytes of data with consistent sub-10ms latency at millions of reads per second. Its key-value storage model and automatic sharding make it ideal for time-series IoT sensor data with simple timestamp-based keys, supporting high-throughput, low-latency access without the overhead of relational features.

Exam trap

Cisco often tests the distinction between operational databases (Bigtable) and analytical warehouses (BigQuery), so the trap here is assuming that 'petabytes of data' automatically means BigQuery, ignoring the critical requirement for single-digit millisecond latency at millions of reads per second.

How to eliminate wrong answers

Option B (BigQuery) is wrong because it is a serverless data warehouse optimized for analytical SQL queries on large datasets, not for single-digit millisecond point reads at millions of operations per second; its latency is typically in the seconds range for interactive queries. Option C (Cloud Spanner) is wrong because it is a globally distributed relational database with strong consistency and ACID transactions, which introduces overhead unsuitable for the simple key-value time-series pattern and cannot match Bigtable's throughput for millions of reads per second. Option D (Firestore) is wrong because it is a mobile and web document database with limited throughput (up to 10,000 writes/second per database) and is not designed for petabyte-scale time-series data or sub-millisecond latency at millions of reads per second.

Practice this question →

63

MCQeasy

Which Google Cloud service is a serverless, highly scalable data warehouse for analytical queries, supporting SQL and integration with BI tools?

A.Firestore

B.Cloud SQL

C.Cloud Spanner

D.BigQuery

AnswerD

Correct: BigQuery is the serverless analytics warehouse.

Why this answer

BigQuery is a serverless, highly scalable data warehouse designed for analytical queries over large datasets. It supports standard SQL and integrates seamlessly with BI tools like Looker and Tableau, making it the correct choice for this use case.

Exam trap

The trap here is that candidates may confuse Cloud Spanner's global scale and SQL support with data warehousing, but Spanner is optimized for transactional consistency, not analytical query performance or BI tool integration.

How to eliminate wrong answers

Option A is wrong because Firestore is a NoSQL document database for mobile and web app development, not a data warehouse for analytical SQL queries. Option B is wrong because Cloud SQL is a fully managed relational database for OLTP workloads, not a serverless data warehouse optimized for large-scale analytics. Option C is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database service for transactional workloads, not a data warehouse designed for analytical queries and BI integration.

Practice this question →

64

Multi-Selecteasy

A company needs to store and analyze large amounts of unstructured data (images, videos) and structured data (CSV logs) in a cost-effective manner. The data should be accessible for analytics with BigQuery. Which two services should they use? (Choose TWO.)

Select 2 answers

A.Cloud SQL

B.Cloud Spanner

C.BigQuery

D.Cloud Storage

E.Firestore

AnswersC, D

BigQuery can query data stored in Cloud Storage via external tables, enabling analytics.

Why this answer

Cloud Storage is the best option for storing unstructured and structured files cost-effectively. BigQuery can analyze this data directly via external tables or after loading, making it a powerful analytics platform.

Practice this question →

65

MCQhard

An IoT application writes sensor readings to Cloud Bigtable with a row key of 'deviceID#timestamp'. The team notices high write latency and hotspots on a few nodes. Which row key design change would most likely improve performance?

A.Add a random prefix to the row key (e.g., hash of deviceID modulo 1000)

B.Reverse the key to 'timestamp#deviceID'

C.Use a single column family with many columns

D.Store all data in one row per device

AnswerA

Hashing the device ID distributes writes across tablet servers, reducing hotspots.

Why this answer

Prefixing timestamps can cause hotspots because writes go to the same tablet server for the same time range. Hashing the device ID or using a field-leveled design (e.g., deviceID inverted) distributes writes across nodes. Adding a random prefix helps but salting with a hash is more systematic.

Practice this question →

66

MCQeasy

A data engineer wants to automatically move objects from Standard storage class to Nearline after 30 days, and then to Archive after 365 days. Which Cloud Storage feature should they configure?

A.Object Versioning

B.Retention Policy

C.Bucket Lock

D.Object Lifecycle rule with SetStorageClass actions

AnswerD

Lifecycle rules can change storage class based on object age.

Why this answer

Option D is correct because Object Lifecycle rules in Google Cloud Storage allow you to automatically transition objects between storage classes (e.g., from Standard to Nearline after 30 days, then to Archive after 365 days) using the SetStorageClass action. This feature is specifically designed for automated lifecycle management, including deletion and class transitions, based on object age or other conditions.

Exam trap

Cisco often tests the distinction between lifecycle management (which changes storage classes) and retention/versioning features (which protect data but do not automate class transitions), leading candidates to confuse Object Versioning or Retention Policy with lifecycle rules.

How to eliminate wrong answers

Option A is wrong because Object Versioning is a feature that preserves non-current object versions to protect against accidental deletion or overwriting; it does not automate storage class transitions. Option B is wrong because Retention Policy is used to enforce a minimum retention period on objects, preventing deletion or modification, but it cannot change storage classes over time. Option C is wrong because Bucket Lock is a mechanism to permanently lock a retention policy, making it immutable; it does not provide any lifecycle-based storage class transitions.

Practice this question →

67

MCQmedium

A company uses Cloud Storage as a data lake with raw, curated, and processed zones. Data in the raw zone should be automatically moved to a cheaper storage class after 30 days, and deleted after 1 year. What is the most efficient way to implement this?

A.Use Object Lifecycle Management with rules to transition to Coldline after 30 days and delete after 365 days.

B.Write a Cloud Function that runs daily, checks object ages, and moves/deletes them.

C.Use Cloud Scheduler to run a script that changes storage class and deletes objects.

D.Set a retention policy on the raw zone to prevent deletion and manually clean up.

AnswerA

Correct: Lifecycle rules automate this efficiently.

Why this answer

Object Lifecycle Management in Cloud Storage allows you to set rules based on object age. You can transition objects to a lower-cost storage class (e.g., Nearline or Coldline) after 30 days and delete after 365 days.

Practice this question →

68

MCQmedium

A data engineer is building a data lake on Google Cloud and needs to separate raw ingested data, curated/cleaned data, and processed/aggregated data. Which Cloud Storage bucket structure is recommended?

A.Create three separate folders in a single bucket: raw, curated, processed.

B.Store all data in one bucket and use object labels to distinguish raw, curated, and processed.

C.Store raw data in a different project for security isolation.

D.Use different storage classes for raw, curated, and processed data within the same bucket.

AnswerA

Using prefixes (folders) within a bucket is a standard pattern for organizing data lake zones, allowing different lifecycle rules per prefix.

Why this answer

A common best practice for data lakes on GCS is to use separate buckets or folders within a bucket (e.g., raw, curated, processed) to manage different stages of data refinement and apply appropriate lifecycle policies.

Practice this question →

69

MCQhard

A team needs to run hybrid transactional/analytical workloads on PostgreSQL-compatible data with low latency. They require high performance on both OLTP and OLAP queries, leveraging a columnar engine. Which Google Cloud service is best suited?

A.AlloyDB

B.Cloud SQL for PostgreSQL

C.Cloud Spanner

D.BigQuery

AnswerA

AlloyDB combines PostgreSQL compatibility with a columnar engine for fast analytics.

Why this answer

AlloyDB is the correct choice because it is a fully managed PostgreSQL-compatible database service on Google Cloud that combines a columnar engine for fast analytical queries with high transactional performance. It uses a columnar query accelerator to offload analytical workloads from the transactional engine, enabling low-latency hybrid transactional/analytical processing (HTAP) without data movement.

Exam trap

The trap here is that candidates may confuse BigQuery's columnar storage with PostgreSQL compatibility, or assume Cloud SQL's PostgreSQL support is sufficient for HTAP workloads, overlooking the need for a dedicated columnar engine.

How to eliminate wrong answers

Option B (Cloud SQL for PostgreSQL) is wrong because it lacks a columnar engine and is optimized primarily for OLTP workloads, so analytical queries would suffer from high latency and poor performance. Option C (Cloud Spanner) is wrong because it is a globally distributed, strongly consistent relational database designed for horizontal scalability and high availability, not for columnar analytics or PostgreSQL compatibility. Option D (BigQuery) is wrong because it is a serverless data warehouse with a columnar storage engine but is not PostgreSQL-compatible and is designed for OLAP, not low-latency OLTP transactions.

Practice this question →

70

MCQeasy

A company wants to use BigQuery to query data stored in Cloud Storage as Parquet files without loading the data into BigQuery storage. Which feature should they use?

A.BigQuery ingestion from Cloud Storage using load jobs

B.Cloud Storage FUSE to mount bucket and query

C.BigQuery federated queries with Cloud Storage

D.BigQuery external tables

AnswerD

External tables enable querying data directly in Cloud Storage without loading.

Why this answer

Option D is correct because BigQuery external tables allow querying data stored in Cloud Storage (including Parquet files) directly without loading it into BigQuery storage. This is achieved by defining a table schema that references the external data source, enabling BigQuery to read the Parquet files on-the-fly using its federated query engine.

Exam trap

The trap here is that candidates confuse 'federated queries' (which typically query external databases like Cloud SQL or Bigtable via BigQuery Omni) with the ability to query Cloud Storage files, which is specifically implemented through external tables.

How to eliminate wrong answers

Option A is wrong because BigQuery ingestion using load jobs imports data into BigQuery's internal storage, which contradicts the requirement to query data without loading it. Option B is wrong because Cloud Storage FUSE mounts a bucket as a filesystem, but BigQuery cannot directly query files via FUSE; it requires a different integration mechanism. Option C is wrong because BigQuery federated queries with Cloud Storage is not a distinct feature; the correct term is 'external tables' or 'federated data sources', and 'federated queries' typically refers to querying external databases like Cloud SQL, not Cloud Storage files.

Practice this question →

71

MCQmedium

An organization needs to prevent data exfiltration from BigQuery by ensuring all traffic to BigQuery APIs goes through VPC boundaries and is restricted to a specific service perimeter. Which Google Cloud security control should they use?

A.Access Transparency

B.IAM conditions on BigQuery roles

C.Cloud Armor

D.VPC Service Controls

AnswerD

VPC Service Controls define a perimeter that restricts data movement to authorized networks and prevents exfiltration.

Why this answer

VPC Service Controls (D) is the correct answer because it allows you to define a service perimeter around BigQuery APIs, ensuring that all traffic to BigQuery must originate from within the defined VPC boundaries. This prevents data exfiltration by blocking unauthorized access from outside the perimeter, even if valid credentials are used. It works by enforcing context-aware access policies at the Google Cloud network edge, not at the application layer.

Exam trap

The trap here is that candidates confuse IAM conditions (which control who can access data) with VPC Service Controls (which control where data can be accessed from), leading them to pick Option B instead of D.

How to eliminate wrong answers

Option A is wrong because Access Transparency provides logs of Google personnel access to your data, not network-level controls for data exfiltration. Option B is wrong because IAM conditions on BigQuery roles control authorization based on attributes like IP address or time, but they do not restrict traffic to VPC boundaries or create a service perimeter; they are identity-based, not network-based. Option C is wrong because Cloud Armor is a web application firewall (WAF) for HTTP(S) traffic to load balancers, not for BigQuery API traffic, and it cannot enforce VPC boundaries or service perimeters.

Practice this question →

72

MCQmedium

A company needs to store petabytes of time-series IoT sensor data and query it with single-digit millisecond latency at millions of reads per second. The data has a simple key-value structure with timestamps. Which Google Cloud database is MOST appropriate?

A.Cloud Bigtable

B.BigQuery

C.Cloud Spanner

D.Firestore

AnswerA

Bigtable is the correct choice: wide-column NoSQL, designed for time-series and IoT workloads, single-digit ms latency, and scales to millions of QPS with additional nodes.

Why this answer

Cloud Bigtable is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, handling petabytes of data with consistent sub-10ms latency at millions of reads per second. Its key-value model with timestamps directly matches the time-series IoT sensor data structure, and it supports high-throughput, low-latency access via the HBase API or Bigtable client libraries.

Exam trap

Cisco often tests the distinction between operational databases (Bigtable) and analytical warehouses (BigQuery), so the trap here is assuming that 'petabytes of data' automatically requires a data warehouse like BigQuery, ignoring the real-time, low-latency key-value access pattern that Bigtable is purpose-built for.

How to eliminate wrong answers

Option B (BigQuery) is wrong because it is a serverless data warehouse optimized for analytical SQL queries on large datasets, not for single-digit millisecond point reads at millions of operations per second; it incurs higher latency (typically hundreds of milliseconds) and is not designed for real-time key-value lookups. Option C (Cloud Spanner) is wrong because it is a globally distributed relational database with strong consistency and SQL support, but its latency and throughput for simple key-value reads are higher than Bigtable, and it is overkill for time-series data that does not require relational joins or transactions. Option D (Firestore) is wrong because it is a mobile and web document database optimized for real-time updates and moderate throughput, not for petabyte-scale time-series data with millions of reads per second; it has throughput limits (e.g., 10,000 writes/second per database) and higher latency for such high-volume workloads.

Practice this question →

73

MCQmedium

A data team needs to run complex analytical queries on a dataset that is frequently updated with new rows. They want to minimize query costs and avoid scanning old data that is rarely queried. Which BigQuery feature should they use?

A.Partitioned tables with partition expiration

B.BigQuery materialized views

C.Clustered tables

D.BigQuery BI Engine

AnswerA

Partitioning allows querying only relevant partitions, and partition expiration can automatically delete old partitions.

Why this answer

Partitioned tables with partition expiration allow you to divide a table into segments based on a date/timestamp column, and automatically delete partitions that are older than a specified duration. This minimizes query costs by only scanning relevant partitions and eliminates storage costs for old, rarely queried data without manual intervention.

Exam trap

Cisco often tests the distinction between performance optimization features (clustering, materialized views, BI Engine) and data lifecycle management features (partition expiration), leading candidates to choose a performance feature when the question explicitly asks about minimizing costs and avoiding scanning old data.

How to eliminate wrong answers

Option B is wrong because BigQuery materialized views precompute and cache query results for faster reads, but they do not automatically expire old data or reduce storage costs for rarely queried rows. Option C is wrong because clustered tables sort data within partitions to improve query performance and reduce bytes scanned, but they do not provide automatic deletion of old data or partition expiration. Option D is wrong because BigQuery BI Engine is an in-memory analysis service that accelerates interactive queries but does not manage data lifecycle or expiration of old rows.

Practice this question →

74

MCQmedium

A data engineer needs to store quarterly financial data that must remain immutable for 7 years to meet regulatory compliance. The data is accessed infrequently after the first year. Which Cloud Storage feature should be used to enforce immutability?

A.Object Lifecycle Rules

B.Object Lock with Retention Policy

C.Object Versioning

D.IAM Conditions

AnswerB

Object Lock sets a retention policy on the bucket, preventing objects from being deleted or overwritten for a specified duration (WORM).

Why this answer

Object Lock with a Retention Policy enforces immutability by preventing object deletion or modification for a specified duration. For regulatory compliance requiring 7-year immutable storage, a retention policy configured with a retention period of 7 years ensures that objects cannot be overwritten or deleted, even by the root account. This directly meets the requirement for data that must remain unchanged for the full 7-year period.

Exam trap

Cisco often tests the misconception that Object Versioning alone provides immutability, but versioning only protects against accidental deletion by preserving old versions—it does not prevent intentional deletion or overwriting of the current version, which is required for true immutability.

How to eliminate wrong answers

Option A is wrong because Object Lifecycle Rules manage transitions and deletions based on age or other criteria, but they do not enforce immutability—objects can still be modified or deleted by users with appropriate permissions. Option C is wrong because Object Versioning preserves previous versions of objects but does not prevent deletion or overwriting of the current version; it allows recovery but does not enforce a write-once, read-many (WORM) model. Option D is wrong because IAM Conditions control access based on attributes like IP address or time, but they do not prevent deletion or modification of objects; they only restrict who can perform actions, not enforce data immutability.

Practice this question →

75

MCQmedium

An e-commerce application uses Cloud SQL (MySQL) for transaction processing. To improve read performance for reporting queries, the team wants to offload read traffic to a separate database instance that stays in sync with the primary. Which Cloud SQL feature should they use?

A.Configure a high availability (HA) replica

B.Use Cloud SQL's failover replica

C.Create a cross-region read replica

D.Enable automatic backups and point-in-time recovery

AnswerC

Read replicas can be in the same or different region and serve read-only traffic.

Why this answer

Option C is correct because Cloud SQL read replicas are designed to offload read traffic from the primary instance while staying in sync using asynchronous replication. Cross-region read replicas specifically allow you to place a replica in a different region, improving read performance for geographically distributed reporting queries without affecting the primary's transaction processing.

Exam trap

The trap here is that candidates confuse high availability (HA) replicas with read replicas, assuming that an HA standby can also serve read traffic, but in Cloud SQL the HA standby is not directly accessible for reads and is only used for failover.

How to eliminate wrong answers

Option A is wrong because a high availability (HA) replica is a synchronous standby that provides automatic failover for high availability, not a separate instance for offloading read traffic; it does not serve read queries independently. Option B is wrong because Cloud SQL does not have a separate 'failover replica' feature; failover is handled by the HA configuration, and the term is often confused with a read replica, but failover replicas are not used for read offloading. Option D is wrong because automatic backups and point-in-time recovery are disaster recovery features that protect data, not mechanisms to offload read traffic or improve query performance.

Practice this question →

Page 1 of 2 · 100 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Pde Storing Data questions.

Start 20-question session