A data engineer uses Cloud Composer to orchestrate a daily batch pipeline. A downstream task should only start after an upstream BigQuery load job finishes successfully and a specific file appears in Cloud Storage. Which combination of operators should the engineer use in the Airflow DAG?
Trap 1: BigQueryInsertJobOperator with wait_for_downstream=True
wait_for_downstream is not a parameter. Sensor and dependency are needed.
Trap 2: DataflowPythonOperator and GCSObjectExistenceSensor
DataflowPythonOperator is for Dataflow pipelines, not BigQuery load.
Trap 3: BigQueryOperator and FileSensor with downstream dependency
FileSensor is for local files, not GCS. Also downstream dependency is incorrect.
- A
BigQueryInsertJobOperator with wait_for_downstream=True
Why wrong: wait_for_downstream is not a parameter. Sensor and dependency are needed.
- B
BigQueryInsertJobOperator and GCSObjectExistenceSensor with upstream dependency
Correct: BigQueryInsertJobOperator performs the load, GCSObjectExistenceSensor polls for the file, and upstream dependency ensures order.
- C
DataflowPythonOperator and GCSObjectExistenceSensor
Why wrong: DataflowPythonOperator is for Dataflow pipelines, not BigQuery load.
- D
BigQueryOperator and FileSensor with downstream dependency
Why wrong: FileSensor is for local files, not GCS. Also downstream dependency is incorrect.