You are the data engineer for a large retail company. The company has an existing on-premises SQL Server database with 10 years of transactional data. They want to move this data to Azure to enable advanced analytics using Azure Synapse Analytics. The data includes customer orders, product details, and inventory. The solution must minimize data movement and support both batch and real-time analytics. The company also wants to use Power BI for reporting. They have a limited budget and prefer a serverless option for compute. You are evaluating the following approaches: A) Use Azure Data Factory to copy all data to Azure Data Lake Storage Gen2, then use Azure Synapse Serverless SQL pool to query the data, and finally connect Power BI to the serverless SQL endpoint. B) Use Azure Database Migration Service to migrate the SQL Server database to Azure SQL Database, then use Azure Synapse Analytics with a dedicated SQL pool to perform analytics, and connect Power BI to the dedicated pool. C) Use Azure Data Factory to copy all data to Azure Blob Storage, then use Azure Stream Analytics to perform real-time analytics, and connect Power BI directly to Stream Analytics output. D) Use Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool for batch analytics, and use Azure Event Hubs and Stream Analytics for real-time data, with Power BI connecting to both serverless SQL and Stream Analytics. Which approach best meets the requirements?
Combines serverless batch and real-time, minimizes data movement, and uses Power BI.
Why this answer
Option D best meets the requirements because it uses Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, enabling cost-effective storage and batch analytics via Azure Synapse Serverless SQL pool (serverless compute). It also incorporates Azure Event Hubs and Stream Analytics for real-time data ingestion and analytics, with Power BI connecting to both the serverless SQL endpoint and Stream Analytics output. This minimizes data movement by keeping data in the lake, supports both batch and real-time analytics, and uses a serverless option to stay within a limited budget.
Exam trap
The trap here is that candidates often choose Option A because it uses serverless SQL and Power BI, but they overlook the explicit requirement for real-time analytics, which Option A does not address.
How to eliminate wrong answers
Option A is wrong because it only supports batch analytics via the serverless SQL pool and lacks a real-time analytics component, failing the requirement for real-time analytics. Option B is wrong because it uses Azure SQL Database and a dedicated SQL pool, which are provisioned (not serverless) compute options, increasing costs and violating the preference for a serverless option; it also moves data to a separate database, increasing data movement. Option C is wrong because it copies data to Azure Blob Storage (which lacks the hierarchical namespace and optimized analytics features of Data Lake Storage Gen2) and uses only Stream Analytics for real-time analytics, missing the batch analytics requirement and the serverless SQL pool for ad-hoc querying.