A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?
Trap 1: Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is a scalable storage service, but it does not provide a query engine. It must be queried using services like Azure Synapse or Azure Databricks, adding complexity.
Trap 2: Azure SQL Database
Azure SQL Database is a transactional relational database. It is not designed for petabyte-scale analytical workloads and would struggle with performance and cost at such volumes.
Trap 3: Azure Cosmos DB
Azure Cosmos DB is a NoSQL operational database optimized for low-latency reads/writes. It is not designed for large-scale analytical queries or columnar storage and would be inefficient for this use case.
- A
Azure Data Lake Storage Gen2
Why wrong: Azure Data Lake Storage Gen2 is a scalable storage service, but it does not provide a query engine. It must be queried using services like Azure Synapse or Azure Databricks, adding complexity.
- B
Azure SQL Database
Why wrong: Azure SQL Database is a transactional relational database. It is not designed for petabyte-scale analytical workloads and would struggle with performance and cost at such volumes.
- C
Azure Synapse Analytics dedicated SQL pool
Azure Synapse Analytics dedicated SQL pool uses MPP and columnar storage to execute complex queries over huge datasets efficiently. It is purpose-built for large-scale data warehousing and analytical workloads.
- D
Azure Cosmos DB
Why wrong: Azure Cosmos DB is a NoSQL operational database optimized for low-latency reads/writes. It is not designed for large-scale analytical queries or columnar storage and would be inefficient for this use case.