You are designing a data processing pipeline in Azure Synapse Analytics that ingests streaming data from Azure Event Hubs and stores it in a dedicated SQL pool. The data volume is approximately 500 GB per hour with peak spikes. The pipeline must minimize data loss during transient failures. Which feature should you implement?
Trap 1: Use PolyBase to load data directly from Event Hubs to the dedicated…
PolyBase is designed for batch loading from external data sources like Azure Blob Storage, not for streaming ingestion from Event Hubs.
Trap 2: Use COPY INTO statement to ingest data from Event Hubs into the…
COPY INTO is for batch loading from files, not for streaming data from Event Hubs.
Trap 3: Enable Event Hubs Capture to write data to Azure Data Lake Storage…
This approach adds latency and does not provide real-time streaming processing; also, it does not directly address transient failure recovery in the pipeline.
- A
Use Azure Synapse Pipeline with Auto-commit and checkpointing to process streaming data.
Auto-commit with checkpointing in Synapse Pipeline provides fault tolerance and exactly-once processing for streaming data.
- B
Use PolyBase to load data directly from Event Hubs to the dedicated SQL pool.
Why wrong: PolyBase is designed for batch loading from external data sources like Azure Blob Storage, not for streaming ingestion from Event Hubs.
- C
Use COPY INTO statement to ingest data from Event Hubs into the dedicated SQL pool.
Why wrong: COPY INTO is for batch loading from files, not for streaming data from Event Hubs.
- D
Enable Event Hubs Capture to write data to Azure Data Lake Storage and then load using PolyBase.
Why wrong: This approach adds latency and does not provide real-time streaming processing; also, it does not directly address transient failure recovery in the pipeline.