A data engineer needs to load 10 TB of CSV files from Amazon S3 into Google BigQuery on a daily basis. Which service should they use to automate this transfer?
Trap 1: Dataproc
Dataproc is for running Spark/Hadoop jobs; not designed for simple file transfers.
Trap 2: Cloud Data Fusion
Cloud Data Fusion is a managed ETL/ELT service, but it is overkill and not the simplest solution for a straightforward S3-to-BigQuery transfer.
Trap 3: Storage Transfer Service
Storage Transfer Service transfers data to Cloud Storage, not directly to BigQuery. Additional steps are needed.
- A
Dataproc
Why wrong: Dataproc is for running Spark/Hadoop jobs; not designed for simple file transfers.
- B
Cloud Data Fusion
Why wrong: Cloud Data Fusion is a managed ETL/ELT service, but it is overkill and not the simplest solution for a straightforward S3-to-BigQuery transfer.
- C
BigQuery Data Transfer Service
BigQuery Data Transfer Service supports scheduled transfers from Amazon S3 directly into BigQuery.
- D
Storage Transfer Service
Why wrong: Storage Transfer Service transfers data to Cloud Storage, not directly to BigQuery. Additional steps are needed.