A company runs a mission-critical Azure Data Factory pipeline that ingests data every hour from Azure Blob Storage into Azure Synapse Dedicated SQL Pool. Recently, the pipeline has been failing with timeout errors during the copy activity. The source blob files are around 500 MB each. Which configuration change would MOST effectively reduce the likelihood of timeout errors?
Trap 1: Decrease the 'Batch size' for the copy activity.
Smaller batches increase number of operations, potentially increasing timeout risk.
Trap 2: Change the sink to use PolyBase with staging enabled.
PolyBase is for bulk load into Synapse, not directly for timeout issues.
Trap 3: Increase the Data Integration Unit (DIU) to 8.
Higher DIU may not resolve timeout if the source is the bottleneck.
- A
Decrease the 'Batch size' for the copy activity.
Why wrong: Smaller batches increase number of operations, potentially increasing timeout risk.
- B
Change the sink to use PolyBase with staging enabled.
Why wrong: PolyBase is for bulk load into Synapse, not directly for timeout issues.
- C
Increase the Data Integration Unit (DIU) to 8.
Why wrong: Higher DIU may not resolve timeout if the source is the bottleneck.
- D
Enable 'Enable staging' and set 'Degree of copy parallelism' to a higher value.
Increases parallelism, reducing copy time and timeout likelihood.