An application stores sensor readings in Azure Table Storage. Each sensor produces thousands of readings per hour. Queries always filter by sensor ID and time range. A developer needs to choose the partition key and row key. Which design best balances query performance and write throughput?
Co-locating readings by sensor ID allows the storage engine to scan only that partition for time-range queries. Timestamp row keys are naturally ordered, so range queries resolve efficiently without scanning unrelated partitions.
Why this answer
Option A is correct because it uses sensor ID as the partition key, which ensures all readings for a given sensor are stored in the same partition, enabling efficient range queries by row key (timestamp). This design avoids hot partitions by distributing writes across different sensors, while the row key allows fast point lookups and range scans within a time window, balancing query performance and write throughput.
Exam trap
The trap here is that candidates often choose a partition key that groups data by time (Option C) to optimize time-range queries, but they overlook that this creates a hot partition for all sensors in that time window, severely limiting write throughput.
How to eliminate wrong answers
Option B is wrong because using a single constant partition key ('all-sensors') forces all writes and queries into one partition, creating a hot partition that throttles throughput and degrades performance. Option C is wrong because using timestamp rounded to the hour as the partition key can cause all sensors' data for the same hour to land in the same partition, leading to write contention and poor query performance when filtering by sensor ID (which requires a full partition scan). Option D is wrong because using a random GUID as the partition key scatters each reading across partitions, making queries that filter by sensor ID and time range inefficient (they must scan all partitions) and defeating the purpose of partition key design.