A company uses Cloud SQL for PostgreSQL as its primary database. They want to query this data from BigQuery for analytics without moving the data. They also need to ensure that BigQuery queries see the most recent data (within seconds of changes). Which approach is most suitable?
Datastream provides near real-time CDC replication to BigQuery, ensuring data freshness within seconds.
Why this answer
BigQuery federated queries via external tables can directly query Cloud SQL without data movement. However, for near real-time freshness, the best approach is to use Datastream to replicate changes from Cloud SQL to BigQuery in near real-time. Federated queries alone have higher latency (minutes) and are not suitable for sub-second freshness.