Database engineering on Google Cloud is not just knowing which database to pick — it is understanding the guarantees, trade-offs, and operational patterns of each service. The PCDE exam tests your ability to migrate databases, design for high availability, optimise query performance, and ensure data integrity across GCP's diverse portfolio of managed database services.
Practice this topic
GCP database decision tree: relational and transactional with global scale? Spanner. Relational with regional scale and familiar SQL? Cloud SQL. NoSQL document with real-time sync? Firestore. NoSQL wide-column with high throughput? Bigtable. In-memory cache? Memorystore. Data warehouse and analytics? BigQuery. CAP theorem in practice: Spanner achieves global consistency using TrueTime (GPS + atomic clocks to bound clock skew) — it is CP. Firestore in Datastore mode is strongly consistent for single-entity operations but eventually consistent for queries across entities. Bigtable is strongly consistent for reads and writes to a single row. Multi-region versus single-region: Cloud SQL offers regional HA with synchronous standby (failover in 60s). Spanner multi-region adds cross-region replicas for disaster recovery and read performance at the cost of higher write latency.
Cloud SQL HA: Primary instance with a standby in the same region but different zone. Replication is synchronous (standby must acknowledge before write commits). Failover is automatic on primary failure; promoted standby becomes the new primary. Read replicas are asynchronous and can be in different regions. Cloud SQL for PostgreSQL/MySQL/SQL Server: version compatibility, flag configuration (shared_buffers, max_connections, innodb_buffer_pool_size), maintenance windows, automatic storage increases. Backup: automated backups (point-in-time recovery up to 7 days), on-demand backups retained until explicitly deleted. Connectivity: Cloud SQL Auth Proxy handles SSL/TLS and IAM-based authentication — the recommended connection method. Private IP (VPC peering) is the secure production approach; Public IP with authorised networks is an alternative but less secure.
Spanner splits data into row ranges called splits. Each split is served by a Paxos group with a leader and replicas. Writes go through the leader; reads can go to any replica (bounded staleness reads) or the leader (strong reads). Interleaving: child tables interleaved into parent tables are stored co-located physically — dramatically reduces cross-split reads for parent-child queries. Schema design: avoid hotspots by not using monotonically increasing primary keys (UUIDs or bit-reversal are common solutions). Composite keys distribute load. Secondary indexes: non-interleaved (global, covering indexes reduce lookups), interleaved (co-located with parent table). Query hints: @{FORCE_INDEX=index_name}, GROUPBY_SCAN_OPTIMIZATION=TRUE. Query Insights: visualise slow queries, lock wait times, transaction statistics. Spanner autoscaler adjusts compute capacity (processing units) based on CPU and storage utilisation.
Database Migration Service (DMS): managed migrations for MySQL, PostgreSQL, and SQL Server to Cloud SQL or AlloyDB. Supports homogeneous (same engine) and some heterogeneous migrations. Continuous replication using CDC (Change Data Capture) minimises downtime during cutover. Datastream: CDC service for streaming changes from Oracle, MySQL, PostgreSQL to BigQuery, Cloud Storage, or Spanner. Use for real-time analytics on operational data or ongoing synchronisation after initial migration. Bigtable design principles: row key is the only index — design keys to distribute reads and writes. Column families group related columns and have separate compaction/TTL settings. Hot rows (caused by monotonically increasing timestamps in keys) are the most common performance problem. Solution: salt keys with a hash prefix or reverse the timestamp.
BigQuery is a good transactional database for OLTP workloads
BigQuery is not a transactional database — it is an analytics engine; use it for OLAP, not OLTP
Cloud SQL read replicas provide high availability for the primary
Cloud SQL read replicas do not provide HA — they are for read scaling; HA requires the HA instance configuration
Cloud Spanner is always the best database choice on GCP
Spanner is not the right choice for every workload — at less than 1000 QPS or with simple regional requirements, Cloud SQL is more cost-effective
Try free Google PCDE practice questions with explanations, topic links and progress tracking.