A retail chain captures real-time sales data from point-of-sale (POS) systems as a stream of events. The data is ingested into Azure Event Hubs. Additionally, the company receives daily inventory files in CSV format uploaded to Azure Data Lake Storage Gen2. The analytics team needs to combine the streaming sales data with the batch inventory data to generate near real-time dashboards and run historical reports. They want a single analytics platform that can handle both streaming and batch workloads, and allow querying data directly in the data lake using SQL. Which Azure service should they choose?
Correct. Azure Synapse Analytics integrates stream processing (via pipelines and Spark/Stream Analytics), batch processing, and serverless SQL to query data lake files directly, all in one platform.
Why this answer
Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that natively integrates with Azure Event Hubs for real-time streaming ingestion and Azure Data Lake Storage Gen2 for batch data. Its Synapse SQL engine supports querying data directly in the data lake using T-SQL, enabling near real-time dashboards and historical reports without data movement. This service is designed to handle both streaming and batch workloads in a single workspace, meeting all the stated requirements.
Exam trap
The trap here is that candidates often choose Azure Stream Analytics because it handles streaming, but they overlook the requirement for a single platform that also supports batch data and direct SQL querying of the data lake, which Stream Analytics cannot do for historical reports.
How to eliminate wrong answers
Option B (Azure Stream Analytics) is wrong because it is a real-time stream processing service that cannot directly query batch data in Data Lake Storage Gen2 using SQL for historical reports; it lacks a unified SQL query layer over both streaming and batch sources. Option C (Azure Data Lake Analytics) is wrong because it is a batch-only analytics service that processes data using U-SQL, not SQL, and does not support real-time streaming ingestion from Event Hubs. Option D (Azure HDInsight) is wrong because it is a managed Hadoop/Spark cluster that requires manual setup and management for both streaming and batch workloads, and it does not provide direct SQL querying of data in the data lake without additional tools like Hive or Spark SQL, making it less integrated and more complex than Synapse Analytics.