This chapter dives into the fundamental differences between relational (SQL) and NoSQL databases on Google Cloud, a key topic for the Google Cloud Digital Leader (GCDL) exam. Understanding when to choose each type is critical for designing scalable, cost-effective cloud solutions. Approximately 10-15% of exam questions touch database selection, data modeling, and Google Cloud's managed database services. By mastering these concepts, you will be able to recommend the right database for various application requirements and avoid common pitfalls that lead to performance issues or cost overruns.
Jump to a section
Imagine two libraries. The first is a traditional library (Relational) where every book is cataloged with a strict schema: each book has a unique ISBN, title, author, publication year, and genre. Books are stored on shelves organized by genre, then by author's last name. To find a book, you query the master card catalog (index) which cross-references ISBN to shelf location. If you want all books by a specific author, you scan the catalog's author index. This system ensures no duplicate entries and enforces rules (e.g., a book cannot have two authors without a junction table). The second library is a NoSQL library where books are stored in bins labeled by genre. Each bin contains books as self-contained documents: a book might have fields like title, author, publication year, and even extra fields like 'ratings' or 'reviews' without needing a separate table. To find a book, you open the bin and scan each document. You can add a new field to one book without affecting others. However, if you want all books by an author, you must search every bin. The NoSQL library is faster for simple lookups by primary key (e.g., book ID) and flexible for varied data, but complex queries across multiple bins require careful indexing. In Google Cloud, relational databases (Cloud SQL, Spanner) enforce schema and support complex joins; NoSQL databases (Firestore, Bigtable) offer flexible schemas and horizontal scalability for high-throughput workloads.
What It Is and Why It Exists
Relational databases (RDBMS) and NoSQL databases represent two fundamentally different approaches to data storage and retrieval. The choice between them impacts application performance, scalability, development speed, and operational complexity. On Google Cloud, you have several managed services for each category:
Relational (SQL): Cloud SQL (for MySQL, PostgreSQL, SQL Server), Cloud Spanner (globally distributed, strongly consistent), and Bare Metal Solution (for Oracle workloads).
NoSQL: Firestore (document-oriented, real-time sync), Bigtable (wide-column, high-throughput), Memorystore (in-memory cache, Redis/Memcached), and Datastore (legacy, still supported but Firestore is preferred).
How They Work Internally
Relational Databases organize data into tables with predefined schemas. Each table has rows and columns, with relationships defined via foreign keys. The database engine uses ACID transactions (Atomicity, Consistency, Isolation, Durability) to ensure data integrity. Queries are written in SQL (Structured Query Language) and executed via a query planner that optimizes joins and index usage. Storage is typically on disk with buffer pools for caching. Scaling is primarily vertical (more CPU/RAM) or through read replicas; horizontal scaling (sharding) is complex and often application-managed.
NoSQL Databases generally sacrifice ACID for horizontal scalability and flexibility. They come in several types: - Document stores (Firestore): Store data as JSON-like documents. Each document has a unique ID and can have varying fields. Collections group related documents. Indexes are created manually for efficient queries. - Key-value stores (Memorystore, Bigtable as wide-column): Data is stored as key-value pairs. Lookups by key are extremely fast, but querying by value requires scanning or secondary indexes. - Wide-column stores (Bigtable): Data is stored in rows and column families, but columns can vary per row. Designed for high write/read throughput and time-series data. - Graph databases (not natively on GCP but can run on Compute Engine): Focus on relationships between entities.
Key Components, Values, Defaults, and Timers
Cloud SQL (Relational) - Default storage: 10 GB SSD (up to 30 TB). - Max connections: depends on tier (e.g., db-n1-standard-1: 250 connections). - Automated backups: enabled by default, retained for 7 days (configurable up to 365 days). - Point-in-time recovery (PITR): binary log retention 1-7 days. - Maintenance window: default is any window, can set preferred day/time.
Cloud Spanner (Relational + Horizontal Scale) - Strong consistency across regions. - Nodes: minimum 1, each node provides 2 TB storage and 2,000 read/write operations per second. - Schema: relational with interleaved tables for hierarchical data. - Automatic replication; read replicas can be added for read scalability.
Firestore (NoSQL Document) - Maximum document size: 1 MiB. - Maximum field index entries per document: 20,000. - Real-time listeners: enables client-side sync. - Queries: automatically indexed on single-field equality, but composite indexes must be created manually. - Scaling: automatic, no downtime.
Bigtable (NoSQL Wide-Column) - Minimum cluster size: 1 node (but production typically 3+ for availability). - Each node: up to 10,000 writes/second (depending on row key design). - Storage: SSD (default) or HDD (cheaper, slower). - Row key design is critical for performance; avoid hotspots by using a well-distributed key. - No ACID transactions across rows; only single-row atomicity.
Configuration and Verification Commands
Creating a Cloud SQL instance (gcloud command):
gcloud sql instances create my-instance \
--database-version=POSTGRES_13 \
--tier=db-custom-2-3840 \
--region=us-central1 \
--storage-type=SSD \
--storage-size=10GBVerifying Firestore database:
gcloud firestore databases listCreating a Bigtable instance:
gcloud bigtable instances create my-bigtable \
--display-name=my-bigtable \
--cluster-config=id=my-cluster,zone=us-central1-a,nodes=3 \
--instance-type=PRODUCTIONHow It Interacts with Related Technologies
App Engine and Cloud Functions can connect to both relational and NoSQL databases via client libraries.
Cloud Run and GKE applications use environment variables or Secrets Manager for database connection strings.
BigQuery can query Cloud SQL (federated queries) and Firestore (via external tables) for analytics.
Dataflow can stream data into Bigtable for real-time processing.
IAM controls access to database instances; Cloud SQL uses Cloud SQL Proxy for secure connections.
Choosing Between Relational and NoSQL
The GCDL exam expects you to evaluate based on: - Data structure: Highly structured, unchanging schema → relational. Flexible, evolving schema → NoSQL. - Consistency requirements: Strong consistency needed (e.g., financial transactions) → relational (or Spanner). Eventual consistency acceptable → NoSQL. - Scalability needs: Massive write throughput (millions of operations/sec) → Bigtable. Global distribution with strong consistency → Spanner. Moderate scale → Cloud SQL with read replicas. - Query complexity: Complex joins, aggregations → relational. Simple key-value lookups or filtered scans → NoSQL. - Development speed: Agile development with frequent schema changes → NoSQL (especially Firestore).
Assess Data Requirements
Begin by analyzing the application's data model. If the data is highly structured with clear relationships (e.g., customer orders with line items), relational is likely best. If the data is semi-structured or varies between records (e.g., user profiles with optional fields), NoSQL is more suitable. Also consider whether you need ACID transactions; if yes, lean toward relational or Spanner.
Evaluate Scalability Needs
Estimate read/write throughput and storage growth. For predictable, moderate scale (thousands of QPS), Cloud SQL works with vertical scaling and read replicas. For millions of operations per second or petabytes of data, consider Bigtable (high write throughput) or Spanner (global distribution). Firestore scales automatically but has limits on write rate per document (1 write per second per document).
Determine Consistency Requirements
If the application requires strong consistency across reads and writes (e.g., banking), choose relational (Cloud SQL) or Spanner. If eventual consistency is acceptable (e.g., social media feeds), Firestore or Bigtable are fine. Note that Firestore provides strong consistency for single-document reads but eventual consistency for queries unless using a specific mode.
Select Google Cloud Service
Map your requirements to a specific service: Cloud SQL for traditional relational, Spanner for globally distributed relational, Firestore for mobile/web apps with real-time sync, Bigtable for analytical or time-series workloads, Memorystore for caching. Consider cost: Cloud SQL and Spanner have higher per-GB costs; Bigtable is cost-effective for large volumes; Firestore charges per read/write/delete operations.
Design Data Model and Indexes
For relational databases, normalize tables to reduce redundancy but denormalize for read performance if needed. For NoSQL, design documents to store data that is accessed together. In Firestore, create composite indexes for queries that filter on multiple fields. In Bigtable, design row keys to avoid hotspots (e.g., use a hash prefix). Test with realistic data volumes.
Scenario 1: E-commerce Platform
A mid-sized e-commerce company uses Cloud SQL (PostgreSQL) for its transactional database: storing user accounts, product catalog, and order history. Orders involve multiple tables (users, products, order_items) that require joins and ACID compliance to ensure inventory accuracy. They set up a read replica for reporting and use Cloud SQL Proxy for secure access from App Engine. They chose Cloud SQL because their workload is moderate (few thousand transactions per second) and they need strong consistency. A common misconfiguration is not setting up connection pooling, leading to connection exhaustion under load. They also underestimated storage growth; they now monitor storage usage and enable automatic storage increase.
Scenario 2: Real-Time Chat Application
A startup builds a chat app using Firestore for storing messages and user profiles. Each chat room is a collection of documents, and messages are written with server timestamps. Firestore's real-time listeners allow clients to receive new messages instantly. They chose Firestore because of its flexible schema (each message can have different metadata) and automatic scaling. However, they initially forgot to create composite indexes for queries like 'messages in room X ordered by timestamp', causing queries to fail with a 'missing index' error. They also hit the 1 MiB document size limit when storing large attachments as base64; they now store attachments in Cloud Storage and reference the URL.
Scenario 3: IoT Sensor Data
A manufacturing company collects sensor readings from thousands of devices every second. They use Bigtable with a row key design of 'device_id#timestamp' to ensure even distribution. Each row stores metrics like temperature, pressure, and vibration in column families. Bigtable handles 1 million writes per second with low latency. They chose Bigtable because of its high throughput and ability to store petabytes of data. A common issue is poor row key design; they initially used 'timestamp#device_id' which caused hotspots on recent timestamps. After redesigning to prefix with device_id, performance improved dramatically. They also use Dataflow to stream data from Pub/Sub into Bigtable for real-time analytics.
The GCDL exam (Objective 2.3) tests your ability to differentiate between relational and NoSQL databases and map them to Google Cloud services. Key areas:
1. Exact Objective Codes: - 2.3: Determine which Google Cloud database or storage service is appropriate for a given use case. - Sub-objectives include evaluating data structure, consistency, scalability, and cost.
2. Common Wrong Answers and Why Candidates Choose Them: - Choosing Cloud Spanner for every relational need: Candidates often pick Spanner because it's 'globally distributed and strongly consistent,' but Spanner is overkill for single-region applications with moderate throughput. Cloud SQL is simpler and cheaper. - Choosing Firestore for high-write IoT workloads: Firestore has a write limit of 1 write per second per document (10,000 writes per second per database in certain modes). Bigtable is better for massive write throughput. - Selecting Bigtable for transactional applications: Bigtable does not support multi-row ACID transactions. Candidates may confuse its high throughput with transaction support. - Thinking Cloud SQL cannot scale horizontally: While Cloud SQL primarily scales vertically, it supports read replicas, and Spanner is the horizontal scaling option. Candidates may dismiss Cloud SQL for scalability concerns without considering replicas.
3. Specific Numbers and Terms: - Firestore document size limit: 1 MiB. - Bigtable node throughput: ~10,000 writes/second per node (varies with row key design). - Cloud Spanner node storage: 2 TB per node. - Cloud SQL backup retention: default 7 days. - Memorystore for Redis: up to 300 GB per instance.
4. Edge Cases and Exceptions: - Firestore in Datastore mode vs Native mode: Datastore mode uses a different API and does not support real-time listeners. The exam may ask which mode to use for legacy Datastore apps. - Cloud SQL for MySQL vs Cloud SQL for PostgreSQL: Feature parity is close, but PostgreSQL supports extensions like PostGIS. The exam may ask which database to use for geospatial queries. - Spanner vs Cloud SQL for global applications: Spanner provides strong consistency globally, but Cloud SQL can use cross-region replication with eventual consistency.
5. How to Eliminate Wrong Answers: - If the use case mentions 'real-time updates to mobile clients,' eliminate Bigtable and Cloud SQL (no built-in real-time sync) and choose Firestore. - If the use case says 'millions of writes per second,' eliminate Firestore and Cloud SQL; choose Bigtable. - If the use case requires 'complex joins and ACID,' eliminate NoSQL options; choose Cloud SQL or Spanner. - If the use case is 'global, strongly consistent, high availability,' choose Spanner. - If the use case is 'caching frequently accessed data,' choose Memorystore.
Relational databases (Cloud SQL, Spanner) enforce schema and provide ACID; NoSQL (Firestore, Bigtable) offer flexibility and horizontal scaling.
Choose Cloud SQL for moderate-scale, single-region applications with complex queries and strong consistency.
Choose Cloud Spanner for globally distributed, strongly consistent, high-availability workloads.
Choose Firestore for mobile/web apps needing real-time sync, flexible schema, and automatic scaling.
Choose Bigtable for high-throughput write workloads (IoT, time-series) and large analytical datasets.
Firestore document size limit is 1 MiB; Bigtable node throughput is ~10,000 writes/second per node.
Always consider cost: Cloud SQL charges for storage and compute; Firestore charges per operation; Bigtable charges per node hour.
These come up on the exam all the time. Here's how to tell them apart.
Cloud SQL (Relational)
Fixed schema with tables, rows, and columns.
Supports complex joins and subqueries.
ACID transactions across multiple rows/tables.
Scales vertically; read replicas for read scaling.
Best for structured data with relationships.
Firestore (NoSQL Document)
Flexible schema; each document can have different fields.
No joins; data is denormalized or referenced via document IDs.
Strong consistency for single-document operations; eventual for multi-document queries.
Automatic horizontal scaling; no downtime.
Best for semi-structured data, real-time apps, and rapid iteration.
Mistake
NoSQL databases are always faster than relational databases.
Correct
NoSQL databases excel at simple key-value lookups and high write throughput, but complex queries (e.g., multi-table joins) are typically slower or unsupported. Relational databases with proper indexing can be very fast for complex queries. Performance depends on the use case.
Mistake
Cloud Spanner is just a scaled-up Cloud SQL.
Correct
Spanner is architecturally different: it uses TrueTime for global consistency and automatic sharding. Cloud SQL is a traditional RDBMS with limited horizontal scaling. Spanner is designed for global distribution, while Cloud SQL is best for regional deployments.
Mistake
Firestore is a key-value store.
Correct
Firestore is a document-oriented NoSQL database. While documents are identified by keys, they contain structured data (fields) and support queries on fields. It is not a simple key-value store like Memorystore.
Mistake
Bigtable supports SQL queries.
Correct
Bigtable does not support SQL natively. It uses the HBase API for CRUD operations. You can use BigQuery to query Bigtable data via external tables, but that is not native Bigtable querying.
Mistake
You should always use Cloud SQL for transactional systems.
Correct
While Cloud SQL is great for many transactional systems, if you need global distribution and strong consistency, Spanner is the better choice. Also, for very high transaction rates, Bigtable may be more suitable if ACID is not required.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Cloud SQL is a managed relational database for MySQL, PostgreSQL, and SQL Server, designed for regional deployments with vertical scaling. Cloud Spanner is a globally distributed, horizontally scalable relational database with strong consistency and automatic replication. Use Cloud SQL for simpler, single-region applications; use Spanner for global, high-availability systems that need ACID across regions.
Use Firestore for mobile/web applications with real-time data sync, flexible document schemas, and moderate write throughput (up to 10,000 writes/second per database). Use Bigtable for high-throughput write workloads (millions of ops/sec), time-series data, and analytical applications where eventual consistency is acceptable. Bigtable does not support real-time listeners.
Bigtable does not support SQL natively. You can use BigQuery to query Bigtable data via external tables, but that is for analytical purposes, not interactive transactions. For SQL queries, consider Cloud SQL or Spanner.
Cloud SQL can have up to 30 TB of storage (with SSD or HDD). The actual limit depends on the machine type. You can enable automatic storage increase to avoid running out of space.
Firestore supports ACID transactions, but only for operations within a single database. Transactions can include multiple read and write operations on documents, and they are atomic. However, Firestore does not support joins or multi-table transactions like relational databases.
Firestore Native mode provides real-time listeners, mobile SDKs, and a new query engine. Datastore mode is backward-compatible with Google Cloud Datastore and uses a different API. Native mode is recommended for new applications; Datastore mode is for migrating existing Datastore apps.
Cloud SQL is a full relational database for persistent data. Memorystore is an in-memory cache (Redis or Memcached) for improving read performance by caching frequently accessed data. Use Memorystore for caching; do not use it as a primary database because data is not durable by default (Redis can be configured for persistence).
You've just covered NoSQL vs Relational Databases on Google Cloud — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.
Done with this chapter?