You are tasked with designing a data storage solution for a social media analytics company. They need to store user profile data (JSON) and social media posts (text and images). The data is used for machine learning models that require fast random access to individual user profiles and the ability to run analytical queries over posts. The solution must provide low-latency reads for user profiles (milliseconds) and support for large-scale analytics on posts. Which combination of Azure data services should you recommend?
Cosmos DB gives low-latency reads; ADLS Gen2 supports analytics.
Why this answer
Azure Cosmos DB provides low-latency (millisecond) reads for user profiles via its indexing and partitioning capabilities, ideal for fast random access. Azure Data Lake Storage Gen2 (ADLS Gen2) combines a hierarchical namespace with Blob Storage, enabling large-scale analytical queries on posts using tools like Azure Synapse Analytics or Spark, while efficiently storing text and images.
Exam trap
The trap here is that candidates often choose Azure Cosmos DB for both workloads (Option B) because they assume its multi-model support handles analytics, but they overlook that Cosmos DB is a transactional database not designed for large-scale analytical queries, while ADLS Gen2 is purpose-built for data lakes and analytics.
How to eliminate wrong answers
Option B is wrong because using Azure Cosmos DB for both profiles and posts would be cost-prohibitive for large-scale analytics on posts, as Cosmos DB is optimized for transactional workloads, not petabyte-scale analytical queries, and lacks native support for hierarchical file storage. Option C is wrong because Azure SQL Database is a relational store that struggles with semi-structured JSON profiles and large binary images, and it cannot scale to handle massive analytical workloads on posts without significant performance degradation and cost. Option D is wrong because Azure Table Storage is a NoSQL key-value store that does not support complex queries or indexing for fast random access to JSON profiles, and Azure Blob Storage lacks a hierarchical namespace and native analytical integration, making large-scale analytics inefficient.