GCDLChapter 38 of 101Objective 2.3

Choosing the Right Database on GCP

This chapter covers how to choose the right managed database service on Google Cloud Platform for different application workloads. The GCDL exam (Objective 2.3) expects you to understand the key characteristics, use cases, and limitations of Cloud SQL, Cloud Spanner, Firestore, Bigtable, BigQuery, and Memorystore. Approximately 10-15% of exam questions relate to database selection, often in scenario-based formats where you must match workload requirements to the correct service. This chapter provides deep technical insight into each service's internal architecture, consistency models, scaling behavior, and cost implications.

25 min read
Intermediate
Updated May 31, 2026

Database Selection as Tool Choice

Choosing a database on GCP is like a master carpenter selecting the right tool for a specific job. Each tool is designed for a particular material and task: a hammer drives nails, a saw cuts wood, a screwdriver drives screws. You wouldn't use a hammer to cut a board, nor a saw to drive a nail. Similarly, each Google Cloud database service is optimized for a particular data model and workload. Cloud SQL is like a precision screwdriver for relational data with strict schemas and ACID transactions. Firestore is a Swiss Army knife for mobile apps needing real-time sync and flexible documents. Bigtable is a sledgehammer for high-throughput, low-latency time-series or IoT data. BigQuery is a power saw for analytical queries over petabytes of data. The wrong choice leads to inefficiency, high costs, or poor performance—just as using a saw as a hammer would damage both the tool and the workpiece. The GCDL exam tests your ability to match workload characteristics to the correct database service, not just memorizing features. Understanding the underlying mechanics of each service—how they store, index, and query data—is critical. For instance, Cloud Spanner provides global strong consistency and horizontal scaling by using a true distributed SQL engine with atomic clocks and the Percolator transaction protocol, which is fundamentally different from the eventual consistency model of Firestore. Recognizing these mechanical differences is key to selecting the right database.

How It Actually Works

Overview of GCP Managed Database Services

Google Cloud offers a spectrum of fully managed database services, each optimized for specific data models and access patterns. The decision tree starts with the data model: relational (structured, ACID) versus NoSQL (flexible schema, high scalability). Within relational, you choose between Cloud SQL (traditional OLTP, up to 30 TB) and Cloud Spanner (global scale, strong consistency, unlimited storage). Within NoSQL, you choose between Firestore (document store, real-time, mobile), Bigtable (wide-column, high throughput, low latency), and Memorystore (in-memory cache, sub-millisecond). BigQuery is a serverless data warehouse for analytical SQL queries over petabytes, not transactional. Understanding the internal architecture of each is critical for exam scenarios.

Cloud SQL: Traditional Relational OLTP

Cloud SQL provides managed MySQL, PostgreSQL, and SQL Server instances. It is designed for standard relational workloads with ACID transactions, joins, and complex queries. Internally, Cloud SQL runs a single database instance (or a primary + read replicas for read scaling). Storage is persistent disks (up to 30 TB) with automatic encryption at rest. Replication is asynchronous for cross-region read replicas, meaning eventual consistency for reads from replicas. Cloud SQL does not support auto-scaling compute; you must manually resize. For high availability, you configure a regional failover replica with synchronous replication to another zone. The failover is automatic but takes about 60 seconds. Exam trap: Cloud SQL is not suitable for global distribution or high write throughput (> 50K writes/sec). The GCDL exam tests that you know Cloud SQL is for traditional web applications, CMS, and ERP systems.

Cloud Spanner: Globally Distributed Strong Consistency

Cloud Spanner is Google's globally distributed, strongly consistent relational database service. It combines the benefits of SQL with horizontal scalability and synchronous replication across regions. Internally, Spanner uses a distributed consensus protocol (Paxos) across replicas. Each tablet (data partition) is replicated in multiple zones within a region, and writes require a quorum of replicas. Spanner uses TrueTime (atomic clocks and GPS) to assign globally monotonic timestamps, enabling external consistency (linearizability). This means reads and writes are globally consistent without sacrificing performance. Spanner supports standard SQL (with some limitations like no stored procedures) and transactions with snapshot isolation and serializable isolation. Key exam numbers: Spanner can scale to hundreds of nodes, each node provides 2 TB of storage and up to 10,000 reads/sec or 2,000 writes/sec (depending on data size). Costs are high: $0.90 per node per hour plus storage. Use cases: global financial systems, inventory, gaming leaderboards. Common wrong answer: candidates think Spanner is just a bigger Cloud SQL—it's not; it's a fundamentally different architecture.

Firestore: Document Store for Mobile and Web

Firestore is a flexible, scalable NoSQL document database for mobile, web, and server development. Data is stored as documents (key-value pairs) in collections. Each document has a maximum size of 1 MiB. Firestore offers two consistency modes: strong consistency for single-region (Datastore mode) and eventual consistency for multi-region (Native mode). In Native mode, Firestore uses Google's global infrastructure to replicate data across regions but with eventual consistency for queries across regions. Firestore supports real-time listeners that push updates to clients via WebSockets, making it ideal for chat apps, live dashboards, and user presence. For transactions, Firestore supports multi-document transactions with atomicity but only within a single database. Write throughput is limited: up to 10,000 writes per second per database (default) and 500 writes per second per collection group. Indexing is automatic but you can create composite indexes for complex queries. Exam trap: Firestore is not for high write throughput or complex analytical queries; use Bigtable or BigQuery instead.

Bigtable: High-Throughput Wide-Column NoSQL

Cloud Bigtable is Google's fully managed, scalable NoSQL database for large analytical and operational workloads. It is a wide-column store (like HBase) that supports high throughput (millions of ops/sec) and low latency (< 10 ms). Internally, Bigtable stores data in tablets (sorted by row key) on Colossus (Google's distributed file system). Data is replicated across zones within a region for availability. Bigtable does not support SQL; it uses the HBase API or the native Bigtable client library. It is not ACID; it supports single-row transactions only. Schema design is critical: row keys determine performance. Hotspotting occurs if row keys are monotonically increasing (e.g., timestamps). Best practice: use salting or field promotion to distribute writes. Bigtable is ideal for time-series data, IoT, ad tech, and recommendation engines. Exam numbers: minimum 1 node (cost ~$0.65/hour), each node provides 2 TB storage and up to 10,000 ops/sec. Bigtable does not support SQL queries, so you cannot run JOINs. Common wrong answer: candidates think Bigtable can replace Cloud SQL for transactional workloads—it cannot.

BigQuery: Serverless Data Warehouse

BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse. It uses a columnar storage format (Capacitor) and a distributed query engine (Dremel). BigQuery separates compute from storage; you pay for storage (compressed) and query processing (slots). It supports standard SQL with features like window functions, GIS functions, and machine learning (BigQuery ML). BigQuery is designed for analytical queries over petabytes of data, not transactional workloads. Data is immutable; you cannot update individual rows efficiently (use DML but it's slow). BigQuery automatically replicates data across regions for durability. Performance is measured in slots (virtual CPUs). You can use flat-rate pricing or on-demand (pay per TB processed). Exam numbers: first 10 GB storage per month free; first 1 TB query per month free. Common trap: candidates try to use BigQuery for real-time transactions—it's not designed for that.

Memorystore: In-Memory Cache

Memorystore provides managed Redis and Memcached instances for caching, session storage, and real-time data. Data is stored in memory, providing sub-millisecond latency. Memorystore supports replication for high availability and persistence (AOF or RDB). Standard tier (Redis) offers replication and automatic failover; Basic tier is a single node with no replication. Maximum cache size is 300 GB for Redis. Memorystore is not a primary database; it's used to accelerate applications by caching frequently accessed data. Exam trap: Memorystore is not a durable data store; data can be lost on failure if persistence is not enabled. Use cases: session caching, API rate limiting, real-time leaderboards.

Selection Criteria for the Exam

The GCDL exam tests your ability to match workload characteristics to the correct database. Key decision factors: (1) Data model: relational vs. NoSQL. (2) Consistency requirements: strong vs. eventual. (3) Transaction support: ACID vs. single-row. (4) Scale: throughput, storage, geographic distribution. (5) Query pattern: OLTP vs. OLAP. (6) Latency: sub-millisecond vs. seconds. (7) Cost: compute vs. storage vs. query. Example scenario: A global e-commerce platform needs to manage inventory with strong consistency across regions. The correct answer is Cloud Spanner, not Cloud SQL (cannot scale globally) or Firestore (eventual consistency across regions). Another scenario: A mobile game needs real-time leaderboards with low latency. The correct answer is Memorystore (Redis) for the leaderboard, combined with Firestore for user profiles. Common wrong answer: using Bigtable for leaderboards—Bigtable is high throughput but not sub-millisecond for simple reads.

Walk-Through

1

Identify Data Model Requirements

Determine whether the workload requires a relational model (tables with rows and columns, joins, ACID transactions) or a NoSQL model (flexible schema, high scalability, eventual consistency). For relational, consider Cloud SQL (single region, moderate scale) or Cloud Spanner (global, strong consistency). For NoSQL, consider Firestore (document, real-time), Bigtable (wide-column, high throughput), or Memorystore (in-memory cache). Exam scenarios often describe structured data with foreign keys—that points to relational. If the scenario mentions flexible schema or rapid iteration, NoSQL is likely.

2

Assess Consistency and Transaction Needs

Evaluate whether the application requires strong consistency (all reads see the latest write) or eventual consistency (reads may see stale data). For strong consistency across regions, Cloud Spanner is the only option. For single-region strong consistency, Cloud SQL or Firestore (single-region mode) can work. If the workload needs multi-document transactions (ACID), Cloud SQL or Spanner are required; Firestore supports multi-document transactions but only within a single database. Bigtable and Memorystore do not support multi-row transactions.

3

Analyze Throughput and Latency Requirements

Determine the required write/read throughput and latency. For high write throughput (millions of ops/sec) and low latency (<10 ms), Bigtable is the best choice. For sub-millisecond latency for simple key-value lookups, Memorystore (Redis) is ideal. For moderate throughput (up to 50K writes/sec) with SQL support, Cloud SQL works. For global scale with strong consistency, Spanner provides up to 10K writes/sec per node (scalable). Firestore has limits: 10K writes/sec per database. BigQuery is for analytical queries, not transactional throughput.

4

Consider Geographic Distribution

Decide if the application needs to serve users globally with low latency. Cloud Spanner is the only relational database that supports strongly consistent multi-region replication. Firestore offers multi-region support but with eventual consistency. Cloud SQL can have read replicas in other regions but writes are limited to the primary region. Bigtable supports replication across zones within a region; cross-region replication is manual. Memorystore is regional only. BigQuery is global but with eventual consistency for multi-region datasets.

5

Evaluate Cost and Operational Overhead

Compare pricing models. Cloud SQL charges per instance (vCPU, memory, storage) and is suitable for predictable workloads. Cloud Spanner is expensive ($0.90/node/hour) but eliminates sharding complexity. Firestore charges per read/write/delete and storage, with a free tier. Bigtable charges per node per hour ($0.65/node/hour) and storage. BigQuery charges for storage and query processing (per TB). Memorystore charges per GB per hour. Serverless options (Firestore, BigQuery) reduce operational overhead. The exam often tests that Spanner is expensive but necessary for global strong consistency.

What This Looks Like on the Job

Enterprise Scenario 1: Global Gaming Leaderboard A gaming company with millions of daily active users needs a real-time leaderboard that updates scores instantly and is globally consistent. They initially chose Cloud SQL with read replicas in multiple regions, but writes to the primary region caused high latency for non-primary users. They switched to Cloud Spanner. Spanner's global strong consistency ensures every player sees the same leaderboard regardless of region. They configured a multi-region instance (e.g., us-central1, us-east1, europe-west1) with 5 nodes to handle 10,000 writes/sec. The cost was high (~$4.50/hour for nodes + storage), but the consistency requirement justified it. Common misconfiguration: not using interleaved tables for player profiles and scores, leading to cross-table joins that slowed queries. The team added interleaved tables to colocate related data.

Enterprise Scenario 2: IoT Time-Series Analytics A manufacturing company collects sensor data from thousands of machines every second. They need to ingest 1 million data points per second and run analytical queries to detect anomalies. They chose Bigtable for ingestion because of its high write throughput and low latency. They designed row keys with a salted timestamp (e.g., [machine_id]#[timestamp]) to avoid hotspotting. They used a 10-node cluster for 100K writes/sec per node (total 1M writes/sec). For analysis, they used BigQuery to run SQL queries on Bigtable data via federated queries. They also set up a Dataflow pipeline to stream Bigtable data to BigQuery for historical analysis. Issues arose when they used monotonically increasing timestamps as row keys, causing hotspotting on one tablet server. They fixed it by salting with machine_id.

Enterprise Scenario 3: E-Commerce Product Catalog with Real-Time Updates An e-commerce platform needs a product catalog that supports real-time inventory updates and fast product searches. They chose Firestore for its real-time listeners and flexible schema. Each product is a document with fields like name, price, inventory count. They used Firestore's real-time listeners to update the frontend instantly when inventory changes. However, during flash sales, write throughput exceeded Firestore's limit of 10K writes/sec, causing throttling. They mitigated by using a write buffer (e.g., Pub/Sub + Cloud Functions) to batch updates. They also created composite indexes for complex queries like "find products under $50 in stock". Common mistake: not setting up proper security rules, leading to unauthorized access. They used Firebase Authentication and security rules to restrict access.

How GCDL Actually Tests This

The GCDL exam (Objective 2.3) tests your ability to select the appropriate managed database service based on workload characteristics. Questions are typically scenario-based: a description of an application's data model, consistency needs, throughput, latency, and geographic distribution. You must choose the single best service.

Common Wrong Answers and Why Candidates Choose Them: 1. Choosing Cloud SQL for global strong consistency: Candidates see "relational" and "ACID" in the scenario and immediately pick Cloud SQL, ignoring the requirement for multi-region writes. The correct answer is Cloud Spanner. 2. Choosing Bigtable for transactional workloads: Candidates see "high throughput" and assume Bigtable can handle transactions. Bigtable supports only single-row transactions and no SQL. The correct answer is Cloud Spanner or Cloud SQL. 3. Choosing Firestore for complex analytical queries: Candidates see "scalable" and "NoSQL" and pick Firestore. But Firestore lacks SQL aggregation and join capabilities. The correct answer is BigQuery. 4. Choosing Memorystore as a primary database: Candidates see "sub-millisecond latency" and think it can replace a database. Memorystore is a cache, not a durable data store. The correct answer is Firestore or Bigtable for durability.

Specific Numbers and Terms That Appear on the Exam:

Cloud Spanner: nodes, Paxos, TrueTime, external consistency, $0.90/node/hour, 2 TB storage per node.

Cloud SQL: 30 TB storage limit, read replicas (asynchronous), failover replica (synchronous).

Firestore: 1 MiB document limit, 10K writes/sec per database, 500 writes/sec per collection group, eventual consistency in multi-region.

Bigtable: 10 ms latency, millions of ops/sec, row key design, salting, $0.65/node/hour.

BigQuery: serverless, slot, columnar storage, DML (slow), first 1 TB free.

Memorystore: sub-millisecond, Redis/Memcached, persistence (AOF/RDB), up to 300 GB.

Edge Cases and Exceptions:

If the scenario requires SQL but also needs horizontal scaling and strong consistency, the answer is Cloud Spanner, not Cloud SQL.

If the scenario requires real-time updates to mobile clients, the answer is Firestore (real-time listeners).

If the scenario requires high-throughput time-series ingestion and analysis, the answer is Bigtable for ingestion and BigQuery for analysis.

If the scenario requires a cache to reduce database load, the answer is Memorystore.

How to Eliminate Wrong Answers: 1. If the scenario mentions ACID transactions across multiple rows, eliminate Bigtable, Firestore (unless single-region and within a single database), and Memorystore. 2. If the scenario mentions global users with strong consistency, eliminate Cloud SQL and Firestore multi-region. 3. If the scenario mentions complex SQL queries (JOINs, aggregations), eliminate Bigtable and Firestore. 4. If the scenario mentions sub-millisecond latency for simple lookups, eliminate BigQuery and Cloud SQL (unless cached).

Key Takeaways

Cloud SQL is for traditional relational workloads with moderate scale (up to 30 TB) and single-region strong consistency.

Cloud Spanner is the only globally distributed relational database with strong consistency and horizontal scaling.

Firestore is ideal for mobile/web apps requiring real-time updates and flexible schema, but has write throughput limits (10K writes/sec).

Bigtable is for high-throughput, low-latency operational and analytical workloads (e.g., time-series, IoT) with single-row transactions.

BigQuery is a serverless data warehouse for analytical SQL queries over petabytes, not for transactional writes.

Memorystore is an in-memory cache for sub-millisecond access, not a durable primary database.

Database selection on the exam depends on data model, consistency, throughput, latency, geographic distribution, and cost.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud SQL

Single-instance or with read replicas

Up to 30 TB storage

Asynchronous cross-region replication

ACID transactions within a single region

Lower cost per hour (~$0.10/vCPU)

Cloud Spanner

Globally distributed with Paxos consensus

Unlimited storage (scales with nodes)

Strong consistency across regions via TrueTime

ACID transactions globally

Higher cost per hour ($0.90/node)

Firestore

Document store (JSON-like)

Real-time listeners for client updates

Multi-document transactions (within database)

10K writes/sec per database

Automatic indexing

Bigtable

Wide-column store (like HBase)

High throughput (millions of ops/sec)

Single-row transactions only

Scalable to hundreds of nodes

Manual row key design for performance

Watch Out for These

Mistake

Cloud Spanner is just a bigger version of Cloud SQL.

Correct

Cloud Spanner uses a fundamentally different distributed architecture (Paxos, TrueTime) to provide global strong consistency and horizontal scaling. Cloud SQL is a single-instance database with limited scaling (up to 30 TB) and asynchronous replication across regions.

Mistake

Firestore can handle high write throughput.

Correct

Firestore has a default limit of 10,000 writes per second per database. For higher throughput, use Bigtable (millions of ops/sec) or Cloud Spanner (scalable with nodes).

Mistake

Bigtable supports SQL queries.

Correct

Bigtable does not support SQL; it uses the HBase API or native client libraries. To run SQL on Bigtable data, use BigQuery federated queries or export data to BigQuery.

Mistake

BigQuery is suitable for transactional workloads.

Correct

BigQuery is a data warehouse for analytical queries. It does not support high-frequency writes or low-latency reads. Use Cloud SQL or Spanner for OLTP.

Mistake

Memorystore can be used as a primary database.

Correct

Memorystore is an in-memory cache, not a durable database. Data can be lost on failure unless persistence is enabled, but even then, it is not designed for ACID transactions or complex queries.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

When should I use Cloud Spanner instead of Cloud SQL?

Use Cloud Spanner when you need strong consistency across multiple regions, horizontal scaling beyond 30 TB, or high write throughput that exceeds Cloud SQL's capabilities. Cloud SQL is suitable for single-region applications with moderate scale and standard ACID requirements. On the exam, if the scenario mentions global users and strong consistency, choose Spanner.

Can Firestore handle real-time updates?

Yes, Firestore provides real-time listeners that push document changes to subscribed clients via WebSockets, enabling live updates in mobile and web apps. This is a key feature that distinguishes it from other NoSQL databases like Bigtable. However, Firestore's write throughput is limited to 10K writes per second per database.

What is the difference between Firestore Native mode and Datastore mode?

Firestore Native mode is the default for new projects and provides real-time listeners, strong consistency for single-region, and eventual consistency for multi-region. Datastore mode is backward-compatible with Cloud Datastore and offers strong consistency for all reads but lacks real-time listeners. On the exam, Native mode is typically assumed unless specified otherwise.

How do I avoid hotspotting in Bigtable?

Hotspotting occurs when row keys are monotonically increasing (e.g., timestamps), causing all writes to hit a single tablet server. To avoid it, use salting (prefix with a random number or field) or field promotion (swap the order of key components). The exam expects you to recognize that row key design is critical for Bigtable performance.

Can I use BigQuery for real-time analytics?

BigQuery is not designed for real-time updates; it is optimized for batch analytical queries over large datasets. For real-time streaming, use Bigtable for ingestion and then analyze with BigQuery via federated queries or scheduled exports. BigQuery does support streaming inserts, but they are eventually consistent and have limitations.

What is the maximum size of a Firestore document?

The maximum size of a Firestore document is 1 MiB. This includes the document data and any subcollections. For larger data, consider using Cloud Storage for blobs or Bigtable for wide-column data.

Is Memorystore suitable for caching database queries?

Yes, Memorystore (Redis) is commonly used for caching frequently accessed database queries to reduce latency and load on the primary database. It supports key-value operations with sub-millisecond latency. However, it is not a replacement for a durable database; cache misses must be handled by querying the primary database.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Choosing the Right Database on GCP — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Done with this chapter?