A data pipeline ingests streaming data from Pub/Sub into BigQuery via Dataflow. Recently, the pipeline has been failing with 'deadline exceeded' errors. What is the most likely cause?
Trap 1: The BigQuery streaming quota is exceeded.
Quota exceeded errors would be 'quota exceeded', not 'deadline exceeded'.
Trap 2: Dataflow workers are underutilized due to batch size settings.
Underutilization would not cause deadline exceeded errors; it might cause slow processing.
Trap 3: Dataflow autoscaling is disabled.
Autoscaling being disabled might cause performance issues but not specifically deadline exceeded errors.
- A
The BigQuery streaming quota is exceeded.
Why wrong: Quota exceeded errors would be 'quota exceeded', not 'deadline exceeded'.
- B
Dataflow workers are underutilized due to batch size settings.
Why wrong: Underutilization would not cause deadline exceeded errors; it might cause slow processing.
- C
Dataflow autoscaling is disabled.
Why wrong: Autoscaling being disabled might cause performance issues but not specifically deadline exceeded errors.
- D
The Pub/Sub subscription's acknowledgement deadline is too short for the processing time.
A short acknowledgment deadline causes messages to be redelivered, leading to repeated processing attempts and eventual deadline exceeded errors.