A company is designing a data lake solution on Azure Data Lake Storage Gen2. Data will be ingested from IoT devices at high frequency (every 5 seconds). Each device sends a JSON payload of 2 KB. The data must be stored in a hierarchical namespace and partitioned by date and device ID to optimize query performance. Which partition strategy should be used?
Trap 1: Use Azure SQL Database with clustered columnstore index on date and…
Azure SQL Database is a relational store, not a data lake, and cannot handle high-frequency ingest efficiently.
Trap 2: Use Azure Table Storage with PartitionKey set to date and RowKey…
Azure Table Storage does not support hierarchical namespace and is not a data lake solution.
Trap 3: Use Azure Cosmos DB with partition key on (date, device ID) and TTL…
Cosmos DB is not a hierarchical namespace data lake; it's a NoSQL database.
- A
Use Azure SQL Database with clustered columnstore index on date and device ID.
Why wrong: Azure SQL Database is a relational store, not a data lake, and cannot handle high-frequency ingest efficiently.
- B
Organize folders as /YYYY/MM/DD/DeviceID/ in ADLS Gen2 and use file naming that includes timestamp.
This folder structure enables efficient partition pruning based on date and device ID.
- C
Use Azure Table Storage with PartitionKey set to date and RowKey set to device ID.
Why wrong: Azure Table Storage does not support hierarchical namespace and is not a data lake solution.
- D
Use Azure Cosmos DB with partition key on (date, device ID) and TTL for data retention.
Why wrong: Cosmos DB is not a hierarchical namespace data lake; it's a NoSQL database.