Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 226–300 | Page 4/14

226

MCQhard

You are the data engineer for a large retail company. The company has an existing on-premises SQL Server database with 10 years of transactional data. They want to move this data to Azure to enable advanced analytics using Azure Synapse Analytics. The data includes customer orders, product details, and inventory. The solution must minimize data movement and support both batch and real-time analytics. The company also wants to use Power BI for reporting. They have a limited budget and prefer a serverless option for compute. You are evaluating the following approaches: A) Use Azure Data Factory to copy all data to Azure Data Lake Storage Gen2, then use Azure Synapse Serverless SQL pool to query the data, and finally connect Power BI to the serverless SQL endpoint. B) Use Azure Database Migration Service to migrate the SQL Server database to Azure SQL Database, then use Azure Synapse Analytics with a dedicated SQL pool to perform analytics, and connect Power BI to the dedicated pool. C) Use Azure Data Factory to copy all data to Azure Blob Storage, then use Azure Stream Analytics to perform real-time analytics, and connect Power BI directly to Stream Analytics output. D) Use Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool for batch analytics, and use Azure Event Hubs and Stream Analytics for real-time data, with Power BI connecting to both serverless SQL and Stream Analytics. Which approach best meets the requirements?

A.Option A

B.Option C

C.Option B

D.Option D

AnswerD

Combines serverless batch and real-time, minimizes data movement, and uses Power BI.

Why this answer

Option D best meets the requirements because it uses Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, enabling cost-effective storage and batch analytics via Azure Synapse Serverless SQL pool (serverless compute). It also incorporates Azure Event Hubs and Stream Analytics for real-time data ingestion and analytics, with Power BI connecting to both the serverless SQL endpoint and Stream Analytics output. This minimizes data movement by keeping data in the lake, supports both batch and real-time analytics, and uses a serverless option to stay within a limited budget.

Exam trap

The trap here is that candidates often choose Option A because it uses serverless SQL and Power BI, but they overlook the explicit requirement for real-time analytics, which Option A does not address.

How to eliminate wrong answers

Option A is wrong because it only supports batch analytics via the serverless SQL pool and lacks a real-time analytics component, failing the requirement for real-time analytics. Option B is wrong because it uses Azure SQL Database and a dedicated SQL pool, which are provisioned (not serverless) compute options, increasing costs and violating the preference for a serverless option; it also moves data to a separate database, increasing data movement. Option C is wrong because it copies data to Azure Blob Storage (which lacks the hierarchical namespace and optimized analytics features of Data Lake Storage Gen2) and uses only Stream Analytics for real-time analytics, missing the batch analytics requirement and the serverless SQL pool for ad-hoc querying.

Full explanation →

227

MCQmedium

A company plans to migrate an on-premises SQL Server database to Azure. The database uses SQL Server Agent for nightly maintenance jobs, Service Broker for asynchronous messaging, and requires cross-database queries within the same instance. The company wants a fully managed Platform as a Service (PaaS) solution that minimizes application code changes. Which Azure SQL deployment option should they choose?

A.Azure SQL Database (single database)

B.Azure SQL Managed Instance

C.Azure SQL Database elastic pool

D.Azure SQL Database Hyperscale

AnswerB

Azure SQL Managed Instance offers high compatibility with on-premises SQL Server, including SQL Agent, Service Broker, and cross-database queries. It is a PaaS solution that requires minimal to no application code changes, making it the best fit for this scenario.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including SQL Server Agent, Service Broker, and cross-database queries within the same instance, while being a fully managed PaaS offering. This minimizes application code changes because the migration can be performed with minimal schema or code modifications, unlike Azure SQL Database which lacks these features.

Exam trap

The trap here is that candidates often confuse Azure SQL Database elastic pool with a managed instance, assuming it provides instance-level features, when in fact it is merely a cost-saving container for multiple single databases that still lack SQL Server Agent, Service Broker, and cross-database query support.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (single database) does not support SQL Server Agent, Service Broker, or cross-database queries within the same instance, requiring significant application redesign. Option C is wrong because Azure SQL Database elastic pool is a resource-sharing model for multiple single databases and inherits the same limitations as single databases, lacking instance-level features like SQL Server Agent and Service Broker. Option D is wrong because Azure SQL Database Hyperscale is a scaling architecture for single databases and does not provide instance-scoped features such as SQL Server Agent or cross-database query support.

Full explanation →

228

MCQhard

A company is building a data lake and collects data from three sources: (1) a relational database exporting CSV files with fixed columns for customer records, (2) API responses stored as JSON files with varying fields for product reviews, and (3) scanned handwritten notes stored as TIFF images. Which statement correctly categorizes these data by structure type?

A.1: structured, 2: semi-structured, 3: unstructured

B.1: semi-structured, 2: structured, 3: unstructured

C.1: structured, 2: unstructured, 3: semi-structured

D.1: unstructured, 2: semi-structured, 3: structured

AnswerA

Correct. CSV with fixed columns is structured; JSON with varying fields is semi-structured; images are unstructured.

Why this answer

Option A is correct because CSV files from a relational database have a fixed schema (rows and columns), making them structured data. JSON files from API responses with varying fields are semi-structured, as they use tags/keys to organize data without a rigid schema. TIFF images of handwritten notes are unstructured, lacking a predefined data model or organization.

Exam trap

The trap here is confusing semi-structured data (like JSON with varying fields) with unstructured data, or assuming that any file format (like CSV) is always structured regardless of content consistency.

How to eliminate wrong answers

Option B is wrong because it incorrectly labels CSV files as semi-structured (they are structured with fixed columns) and API JSON responses as structured (they are semi-structured due to varying fields). Option C is wrong because it misclassifies API JSON responses as unstructured (they have key-value pairs, making them semi-structured) and TIFF images as semi-structured (they are unstructured binary data). Option D is wrong because it calls CSV files unstructured (they have a fixed schema) and TIFF images structured (they have no predefined data model).

Full explanation →

229

MCQmedium

A database designer is creating a relational database for a library system. Each book can have multiple authors, and each author may have written many books. To avoid data redundancy, the designer creates a separate Authors table and a BookAuthors junction table. This process of organizing data to reduce redundancy and improve integrity is called:

A.Denormalization

B.Indexing

C.Normalization

D.Partitioning

AnswerC

Correct. Normalization organizes data into separate entities to eliminate redundancy; the junction table is a standard technique for many-to-many relationships.

Why this answer

Normalization is the process of organizing data in a relational database to reduce redundancy and improve data integrity by dividing large tables into smaller, related tables and defining relationships between them. In this scenario, creating separate Authors and BookAuthors junction tables eliminates the redundancy of storing author information multiple times for each book, which is a classic example of normalization (specifically achieving third normal form). This directly supports the relational database goal of minimizing duplicate data and ensuring consistency.

Exam trap

The trap here is that candidates often confuse normalization with denormalization, mistakenly thinking that splitting tables to reduce redundancy is a form of denormalization, when in fact it is the core definition of normalization.

How to eliminate wrong answers

Option A is wrong because denormalization is the opposite process—it intentionally introduces redundancy (e.g., combining tables) to improve read performance, often at the cost of data integrity, and is not used to reduce redundancy. Option B is wrong because indexing is a performance optimization technique that creates data structures (e.g., B-trees) to speed up query execution, not a method for organizing data to eliminate redundancy. Option D is wrong because partitioning splits a table into smaller physical segments (e.g., horizontal or vertical partitioning) for manageability or performance, but does not inherently reduce data redundancy or improve integrity.

Full explanation →

230

MCQmedium

A company uses Azure SQL Database for an e-commerce application. The Orders table has millions of rows. Queries that filter on CustomerID and order by OrderDate are slow. The table currently has a clustered index on OrderID (the primary key). Which index strategy will best improve these queries?

A.A. Create a nonclustered index on OrderDate only.

B.B. Create a filtered index on CustomerID where Status = 'Active'.

C.C. Create a nonclustered index on (CustomerID, OrderDate).

D.D. Create a nonclustered index on OrderID and OrderDate.

AnswerC

This composite index allows efficient seek on CustomerID and then an ordered scan of OrderDate, covering both the filter and the sort without additional operations.

Why this answer

Option C is correct because creating a nonclustered index on (CustomerID, OrderDate) directly supports the query's filter (WHERE CustomerID = ?) and sort (ORDER BY OrderDate) operations. This composite index allows SQL Server to seek on CustomerID and then retrieve rows in OrderDate order without a separate sort, eliminating the need for a full clustered index scan on OrderID. It is a covering index for this query pattern, significantly reducing I/O and CPU overhead.

Exam trap

The trap here is that candidates often think a single-column index on the filter column (CustomerID) or the sort column (OrderDate) is sufficient, but they miss that a composite index covering both in the correct order eliminates the need for a separate sort and key lookups, which is critical for large tables in Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because an index on OrderDate only would require a full scan to find rows matching a specific CustomerID, as it does not include the filter column; the query would still need to perform a key lookup or scan to apply the CustomerID predicate. Option B is wrong because a filtered index on CustomerID WHERE Status = 'Active' is too narrow—it only helps queries that include the Status filter, and the original query does not filter on Status, so it would be ignored by the optimizer for this workload. Option D is wrong because an index on (OrderID, OrderDate) does not include CustomerID as the leading key; the query filter on CustomerID cannot use this index efficiently, and it would still require a scan or lookup to satisfy the CustomerID condition.

Full explanation →

231

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every day, they need to incrementally load 100 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The load should use PolyBase for efficient parallel data transfer and must be orchestrated on a recurring schedule. Which Azure service should they use to create and manage this pipeline?

A.A: Azure Data Factory

B.B: Azure Stream Analytics

C.C: Azure Databricks

D.D: Azure Logic Apps

AnswerA

Azure Data Factory supports scheduled pipelines, PolyBase integration for high-speed loading into Synapse, and incremental copy patterns, making it the ideal orchestration tool.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides native orchestration and scheduling capabilities for data pipelines. It supports PolyBase as a sink to load data into Azure Synapse dedicated SQL pool in parallel, and it can directly read CSV files from ADLS Gen2. ADF's built-in triggers allow you to schedule the daily incremental load without additional coding.

Exam trap

The trap here is that candidates may confuse Azure Databricks as a pipeline orchestrator, but it lacks native scheduling and PolyBase integration, whereas Azure Data Factory is the dedicated service for building and managing data pipelines with PolyBase support.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is designed for real-time stream processing (e.g., from Event Hubs or IoT Hub), not for scheduled batch loading of CSV files from ADLS Gen2. Option C (Azure Databricks) is wrong because while it can process data and load into Synapse, it is a Spark-based analytics platform that requires manual pipeline orchestration or integration with ADF; it does not natively provide the simple scheduling and PolyBase integration that ADF offers out of the box. Option D (Azure Logic Apps) is wrong because it is a low-code workflow service for integrating SaaS applications and APIs, not designed for high-throughput data movement or PolyBase-based parallel loading into a dedicated SQL pool.

Full explanation →

232

MCQeasy

You need to migrate an on-premises SQL Server database to Azure. The database uses many stored procedures and CLR assemblies. Which Azure service is most compatible without requiring major application changes?

A.Azure Database for MySQL

B.Azure Virtual Machines with SQL Server

C.Azure SQL Database

D.Azure SQL Managed Instance

AnswerD

Supports CLR and stored procedures.

Why this answer

Option B is correct because Azure SQL Managed Instance provides near 100% compatibility with SQL Server. Option A is wrong because Azure SQL Database does not support CLR assemblies. Option C is wrong because Azure Database for MySQL is a different database engine.

Option D is wrong because Azure Virtual Machines with SQL Server requires patching and management overhead.

Full explanation →

233

MCQeasy

A company needs to store JSON documents that require flexible schema and low-latency access globally. Which Azure data service should they use?

A.Azure Table Storage

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cosmos DB

AnswerD

Cosmos DB supports flexible schema and global distribution.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that natively supports JSON documents with flexible schema. It offers turnkey global distribution, single-digit-millisecond latency at the 99th percentile, and multiple consistency models, making it ideal for low-latency access worldwide.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files as blobs with the need for a database that can query and index JSON documents with low-latency global access, leading them to incorrectly choose Blob Storage instead of Cosmos DB.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store that does not natively support JSON documents with flexible schema; it stores entities as rows with a fixed set of properties and lacks global distribution with low-latency guarantees. Option B is wrong because Azure SQL Database is a relational database that requires a predefined schema and does not offer native JSON document storage with flexible schema; it also lacks built-in global distribution for low-latency access. Option C is wrong because Azure Blob Storage is an object storage service for unstructured binary data and does not provide native JSON document querying, indexing, or global distribution with low-latency access.

Full explanation →

234

MCQeasy

A company stores customer transaction data in Azure Blob Storage. They need to query the data using SQL-based tools without moving the data. Which Azure service should they use?

A.Azure SQL Database

B.Azure Analysis Services

C.Azure Cosmos DB

D.Azure Synapse Serverless SQL pool

AnswerD

Allows querying data in Blob Storage using T-SQL without moving it.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving or copying the data. It uses a pay-per-query model and supports reading common file formats like Parquet, CSV, and JSON, making it ideal for ad-hoc querying over data lakes.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming any 'SQL' service can query external storage, but only Synapse Serverless SQL pool provides native external data querying over Blob Storage without data movement.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database service that requires data to be imported and stored within its own storage engine, not queried in place from Blob Storage. Option B is wrong because Azure Analysis Services is a semantic modeling and analytics engine that requires data to be loaded into an in-memory tabular model, not queried directly from Blob Storage. Option C is wrong because Azure Cosmos DB is a NoSQL database service with its own storage and query APIs (SQL, MongoDB, Cassandra, etc.), and it cannot query external data in Blob Storage without first ingesting it.

Full explanation →

235

MCQeasy

You are analyzing the results of a KQL query in Azure Data Explorer. What does this query return?

A.Total damage per event type

B.All states with flood events sorted by damage

C.Top 5 states with highest total property damage from floods

D.Top 5 flood events with highest damage

AnswerC

The query filters, sums, and returns top 5.

Why this answer

The KQL query uses the 'summarize' operator to aggregate total property damage by state, then applies 'top 5 by' to return the five states with the highest total property damage from flood events. The 'where' clause filters for flood events, and the 'project' operator selects only the state and damage columns, confirming that the result is the top 5 states by total property damage.

Exam trap

The trap here is that candidates often confuse grouping by state versus grouping by event type, or they misinterpret 'top 5 by' as returning all rows sorted rather than only the top 5 rows.

How to eliminate wrong answers

Option A is wrong because the query groups by state, not by event type, so it returns damage per state, not per event type. Option B is wrong because the query uses 'top 5 by' to return only the highest damage states, not all states, and it sorts by damage descending, not alphabetically. Option D is wrong because the query groups by state, not by individual flood events, so it returns aggregated damage per state, not per event.

Full explanation →

236

Multi-Selecthard

Which THREE are valid use cases for Azure Cosmos DB?

Select 3 answers

A.Storing IoT telemetry data with low latency

B.Relational OLTP with complex joins

C.Session state management for web applications

D.Personalization and recommendation engines

E.Storing large files like images and videos

AnswersA, C, D

Cosmos DB provides low-latency reads and writes.

Why this answer

Azure Cosmos DB is a globally distributed, multi-model NoSQL database service designed for low-latency, high-throughput workloads. Storing IoT telemetry data requires fast ingestion and real-time querying, which Cosmos DB supports with single-digit millisecond read/write latencies at the 99th percentile, making it ideal for this use case.

Exam trap

The trap here is that candidates confuse Azure Cosmos DB's multi-model support (e.g., table API, Cassandra API) with relational database capabilities, leading them to incorrectly select Option B for OLTP with complex joins, or they assume Cosmos DB can handle large binary files like Blob Storage, missing the 2 MB document size limit.

Full explanation →

237

MCQhard

A retail company uses Azure Cosmos DB to store product catalog data. They experience high request unit (RU) consumption during peak hours, leading to throttling. Which action should they take to reduce RU consumption without changing the application code?

A.Switch to the Cassandra API

B.Create a composite index on frequently queried fields

C.Enable the Azure Cosmos DB integrated cache

D.Increase the provisioned RU/s

AnswerC

The integrated cache serves repeated queries without consuming RU.

Why this answer

Option D is correct because enabling Cosmos DB caching (Azure Cosmos DB integrated cache) reduces RU consumption by serving repeated queries from cache. Option A is wrong because increasing RU/s would raise costs and not reduce consumption. Option B is wrong because creating a secondary index would increase RU consumption for writes.

Option C is wrong because switching to Cassandra API may not reduce RU consumption.

Full explanation →

238

MCQmedium

Your company has a Power BI dashboard that uses a data model with a single large fact table and several dimension tables. The dashboard loads slowly when users filter by multiple dimensions. Which design change would MOST improve performance?

A.Use page-level filters instead of report-level filters.

B.Create a calculated table that aggregates the fact table at a higher granularity.

C.Ensure the fact table and dimension tables follow a star schema design with proper relationships.

D.Convert the data model to a composite model using DirectQuery for some tables.

AnswerC

Star schema is optimized for analytical queries and filtering.

Why this answer

Option C is correct because a star schema design with proper relationships between the fact table and dimension tables is the foundational best practice for optimizing Power BI data models. This design minimizes the cardinality of relationships, reduces the size of the data model, and enables efficient query folding and storage engine compression, which directly improves filter performance across multiple dimensions.

Exam trap

The trap here is that candidates often confuse UI-level filter scoping (page-level vs. report-level) with actual query performance optimization, or they mistakenly believe that aggregating data or switching to DirectQuery will always improve speed, ignoring the fundamental importance of star schema design for in-memory analytics.

How to eliminate wrong answers

Option A is wrong because page-level filters do not improve query performance; they only change the scope of filter application in the UI, and the underlying query still hits the same large fact table. Option B is wrong because creating a calculated table that aggregates the fact table at a higher granularity would lose detail data and prevent users from drilling down, which is not a performance optimization but a data reduction that changes the analytical capability. Option D is wrong because converting to a composite model with DirectQuery for some tables often introduces latency from the source system and can degrade performance due to the lack of in-memory compression and the need for cross-engine joins, making it a poor choice for improving dashboard responsiveness.

Full explanation →

239

Multi-Selectmedium

Which TWO Azure services can be used to store relational data?

Select 2 answers

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Database for PostgreSQL

D.Azure Blob Storage

E.Azure Table Storage

AnswersA, C

Relational database service.

Why this answer

Option A (Azure SQL Database) and Option D (Azure Database for PostgreSQL) are relational database services. Option B is wrong because Azure Cosmos DB is NoSQL. Option C is wrong because Azure Blob Storage is object storage.

Option E is wrong because Azure Table Storage is NoSQL.

Full explanation →

240

MCQmedium

A company stores customer data in a relational database with columns like CustomerID, Name, and Email. They also store product images as JPEG files in Azure Blob Storage, and customer feedback as JSON documents that contain varying fields such as rating, comments, and optional metadata. Which of the following correctly orders these data types from most structured to least structured?

A.Relational data, images, JSON

B.Images, JSON, relational data

C.Relational data, JSON, images

D.JSON, relational data, images

AnswerC

Correct order: structured (relational), semi-structured (JSON), unstructured (images).

Why this answer

Relational data (CustomerID, Name, Email) is the most structured because it enforces a fixed schema with defined data types and constraints. JSON documents (customer feedback) are semi-structured: they have a flexible schema with optional fields like metadata, but still use key-value pairs. Images (JPEG files) are unstructured binary data with no inherent schema.

Option C correctly orders them from most structured (relational) to least structured (images).

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' or assume images have more structure than JSON because they are stored in a named file, but the key distinction is schema rigidity: relational > JSON > binary blobs.

How to eliminate wrong answers

Option A is wrong because it places images (unstructured binary) before JSON (semi-structured), incorrectly suggesting that binary files have more structure than key-value documents. Option B is wrong because it reverses the entire order, claiming images are most structured and relational data is least structured, which contradicts the fundamental definition of structured vs. unstructured data. Option D is wrong because it ranks JSON as more structured than relational data, but relational databases enforce a rigid schema with primary keys and data types, making them more structured than JSON's flexible schema.

Full explanation →

241

MCQmedium

An e-commerce company uses Azure SQL Database for order processing. The Orders table has columns: OrderID (unique, clustered index), CustomerID, OrderDate, Status, TotalAmount. A common query filters on CustomerID and OrderDate, and sorts by OrderDate descending. The query also returns TotalAmount. Which indexing strategy will produce the best query performance?

A.Create a nonclustered index on (CustomerID, OrderDate DESC) INCLUDE (TotalAmount)

B.Create a nonclustered index on (OrderDate) INCLUDE (CustomerID, TotalAmount)

C.Create a nonclustered index on (OrderDate DESC) INCLUDE (CustomerID, TotalAmount)

D.Create a nonclustered index on (CustomerID) INCLUDE (OrderDate, TotalAmount)

AnswerA

This composite index supports the exact filter (CustomerID and OrderDate), the sort order (OrderDate DESC is included in the key), and the included TotalAmount column eliminates key lookups, making it a covering index for the query.

Why this answer

Option A is correct because it creates a covering index that supports both the equality filter on CustomerID and the range/sort on OrderDate DESC. By including TotalAmount as an included column, the query can be satisfied entirely from the nonclustered index without key lookups to the clustered index, minimizing I/O and improving performance.

Exam trap

The trap here is that candidates often focus on including all columns in the INCLUDE clause but fail to order the key columns correctly to support both the equality filter and the sort order, leading them to pick options that start with the sort column instead of the equality column.

How to eliminate wrong answers

Option B is wrong because the index key starts with OrderDate, which does not support the equality filter on CustomerID efficiently; the database would need to scan all rows matching the OrderDate range and then filter by CustomerID. Option C is wrong for the same reason—leading with OrderDate DESC fails to support the equality predicate on CustomerID, leading to unnecessary scans. Option D is wrong because the index key is only CustomerID, so the sort by OrderDate DESC cannot be satisfied from the index order, requiring an explicit sort operation that degrades performance.

Full explanation →

242

MCQeasy

A retail company uses Azure SQL Database for its inventory system. The database stores sensitive customer information that must be encrypted at rest to comply with data protection regulations. Which feature should they enable?

A.Dynamic Data Masking

B.Always Encrypted

C.Azure Information Protection

D.Transparent Data Encryption (TDE)

AnswerD

TDE encrypts the entire database at rest.

Why this answer

Option B is correct because Transparent Data Encryption (TDE) encrypts data at rest in Azure SQL Database. Option A (Always Encrypted) protects sensitive data in transit and at rest but is more for column-level encryption. Option C (Dynamic Data Masking) obfuscates data from non-privileged users.

Option D (Azure Information Protection) is for file-level protection.

Full explanation →

243

MCQhard

A global social media application allows users to post updates and 'like' posts. The application is designed to prioritize availability and partition tolerance over strong consistency. As a result, when a user likes a post, the like count may not be immediately visible to all users, but it will eventually become consistent across all regions. Which consistency model does this application follow?

A.Strong consistency

B.Eventual consistency

C.Consistent prefix

D.Bounded staleness

AnswerB

Eventual consistency guarantees that if no new updates are made, all replicas will eventually return the same value. This matches the scenario where updates are not immediately visible but become consistent over time, supporting high availability and partition tolerance.

Why this answer

The application prioritizes availability and partition tolerance, which aligns with the eventual consistency model. In this model, updates (like a 'like' count) are propagated asynchronously across replicas, and while reads may return stale data temporarily, all replicas will converge to the same value over time. This is typical of NoSQL systems like Apache Cassandra or Amazon DynamoDB when configured with eventual consistency.

Exam trap

The trap here is that candidates often confuse 'eventual consistency' with 'bounded staleness' because both allow stale reads, but eventual consistency has no guaranteed time or version bound, whereas bounded staleness imposes a strict limit—a distinction Microsoft explicitly tests in DP-900.

How to eliminate wrong answers

Option A is wrong because strong consistency requires all reads to return the most recent write immediately, which would sacrifice availability and partition tolerance—contradicting the application's design priorities. Option C is wrong because consistent prefix guarantees that reads see writes in the order they occurred, but it does not allow for temporary staleness in the like count; it is used in systems like Cosmos DB with a specific consistency level that still imposes ordering constraints. Option D is wrong because bounded staleness guarantees that reads are at most a fixed number of versions or time interval behind the latest write, which imposes a strict upper bound on staleness—not the 'eventually consistent' behavior described where no time bound is guaranteed.

Full explanation →

244

MCQhard

A manufacturing company ingests a continuous stream of sensor data from thousands of IoT devices into Azure Event Hubs. The company also stores historical equipment maintenance records in Azure SQL Database. The operations team needs to join the streaming sensor data with the historical maintenance records in near real-time to detect anomalies, and data scientists need to run ad-hoc T-SQL queries on the combined dataset for analysis. Which Azure service should they use as the primary analytics platform to meet both requirements?

A.Azure Stream Analytics

B.Azure Databricks

C.Azure Synapse Analytics

D.Azure Analysis Services

AnswerC

Azure Synapse Analytics provides both real-time ingestion and a T-SQL-based query engine (SQL pool or serverless) for ad-hoc analysis, meeting both requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest streaming data from Azure Event Hubs via its built-in Spark pools or pipelines, and simultaneously query historical data in Azure SQL Database using T-SQL. This enables near real-time anomaly detection through streaming joins and ad-hoc T-SQL queries for data scientists, all within a single service without needing separate tools.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as sufficient for both requirements, overlooking its lack of ad-hoc T-SQL query support, and mistakenly think Azure Databricks supports T-SQL natively when it actually uses Spark SQL or Python.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that can join streaming data with reference data (e.g., from SQL Database), but it does not support ad-hoc T-SQL queries for data scientists; it uses a SQL-like query language for continuous queries, not interactive T-SQL. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform that can handle streaming and batch workloads, but it does not natively support T-SQL queries; it uses Spark SQL or Python, not the T-SQL dialect required for ad-hoc queries by data scientists. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and business intelligence, not designed for real-time stream processing or direct T-SQL queries on raw combined datasets; it focuses on pre-aggregated data and MDX/DAX queries.

Full explanation →

245

Multi-Selecthard

A company uses Azure Cosmos DB with the SQL API. They need to implement a data partitioning strategy to optimize query performance and avoid hot partitions. Which THREE practices should they follow?

Select 3 answers

A.Use the same partition key for all items

B.Use a synthetic partition key if natural keys are not suitable

C.Avoid monotonically increasing partition key values

D.Keep partition key values as small as possible

E.Choose a partition key with high cardinality

AnswersB, C, E

Synthetic keys can improve distribution when natural keys have low cardinality.

Why this answer

Choosing a high-cardinality partition key (option A) distributes data evenly. Using a synthetic partition key (option C) can combine multiple attributes for better distribution. Avoiding monotonically increasing keys (option D) prevents hot partitions.

Option B (using the same partition key for all items) is incorrect for large datasets. Option E (keeping partition key values very small) is not a primary consideration.

Full explanation →

246

MCQmedium

A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics. The data volume is approximately 2 TB and the network bandwidth is limited. Which approach minimizes data transfer time?

A.Use SQL Server Integration Services (SSIS) to transfer data over the internet.

B.Use Azure Data Box to physically ship the data.

C.Establish a site-to-site VPN and use Azure Data Factory.

D.Use Azure Data Factory with a self-hosted integration runtime over the internet.

AnswerB

Data Box bypasses network constraints for large datasets.

Why this answer

Azure Data Box is the correct approach because it physically ships the 2 TB of data on a secure storage device, bypassing the limited network bandwidth entirely. For large data volumes (multiple TB) with constrained connectivity, offline data transfer is significantly faster than any online method, as it avoids network latency and bandwidth bottlenecks.

Exam trap

The trap here is that candidates often assume online transfer tools like Azure Data Factory or SSIS are always optimal, but for large data volumes with limited bandwidth, offline shipping via Azure Data Box is the only practical solution to minimize transfer time.

How to eliminate wrong answers

Option A is wrong because SSIS over the internet would be severely throttled by the limited network bandwidth, making the transfer of 2 TB extremely slow and impractical. Option C is wrong because a site-to-site VPN still relies on the same limited internet bandwidth, so using Azure Data Factory over it would not reduce transfer time. Option D is wrong because Azure Data Factory with a self-hosted integration runtime over the internet still depends on the available network bandwidth, which is insufficient for a 2 TB transfer in a timely manner.

Full explanation →

247

MCQhard

A global e-commerce company uses Azure SQL Database for its product catalog. The application experiences high read traffic for product detail pages, often running the same queries for popular items. The database’s write workload is moderate. The company wants to improve read performance without increasing the cost of the primary database tier and without changing the application code. Which Azure SQL Database feature should they implement?

A.Read scale-out

B.Active geo-replication

C.Azure SQL Database elastic pool

D.Query Performance Insight

AnswerA

Correct. Read scale-out creates a read-only replica that can handle reporting and read workloads without impacting the primary's performance, and it can be used with a connection string that enables read-only routing.

Why this answer

Read scale-out (A) is correct because it offloads read-only queries to a secondary replica of the Azure SQL Database without changing the application code. By setting `ApplicationIntent=ReadOnly` in the connection string, the database routes read queries to a read-only replica, improving performance for high-read workloads like product detail pages while keeping the primary tier unchanged and avoiding additional cost for a higher tier.

Exam trap

The trap here is that candidates often confuse Active geo-replication with read scale-out, assuming geo-replication can also offload reads, but geo-replication requires explicit connection string changes and does not provide automatic read routing like read scale-out does.

How to eliminate wrong answers

Option B (Active geo-replication) is wrong because it is designed for disaster recovery and regional failover, not for offloading read-only queries; it requires separate connection strings and does not automatically route read traffic to a secondary replica without application changes. Option C (Azure SQL Database elastic pool) is wrong because it is a cost-management feature for pooling multiple databases with varying usage patterns, not a performance optimization for read-heavy queries on a single database. Option D (Query Performance Insight) is wrong because it is a monitoring and diagnostic tool that identifies performance bottlenecks but does not directly improve read performance or offload read traffic.

Full explanation →

248

MCQeasy

A company is designing a relational database solution on Azure for an e-commerce platform. They need to ensure high availability and automatic failover in case of a regional outage. Which Azure service should they use?

A.Azure SQL Database with active geo-replication

B.Azure SQL Managed Instance with local redundancy

C.Azure Database for PostgreSQL with read replicas

D.Azure SQL Database (single database)

AnswerA

Active geo-replication provides automatic failover across regions.

Why this answer

Option C is correct because Azure SQL Database with active geo-replication provides automatic failover to a secondary region. Option A is wrong because Azure SQL Database single database does not automatically failover across regions. Option B is wrong because Azure Database for PostgreSQL with read replicas does not provide automatic failover.

Option D is wrong because Azure SQL Managed Instance with local redundancy does not provide regional failover.

Full explanation →

249

MCQeasy

A company operates an online store that processes customer orders. When a customer places an order, the system must immediately reduce the inventory count for the purchased items and record the order details. At the end of each month, the company runs reports that aggregate sales data over the past month to analyze trends. Which type of data processing workload best describes the order placement activity?

A.Transactional processing

B.Analytical processing

C.Batch processing

D.Stream processing

AnswerA

Order placement involves immediate, real-time updates to inventory and order records, requiring transactional consistency and ACID properties. This is a classic example of an Online Transaction Processing (OLTP) workload.

Why this answer

Order placement requires immediate inventory reduction and order recording, which demands ACID (Atomicity, Consistency, Isolation, Durability) guarantees. This is a classic transactional processing workload, typically handled by OLTP (Online Transaction Processing) systems like SQL Server or Azure SQL Database, ensuring data integrity even under concurrent access.

Exam trap

The trap here is confusing the immediate, atomic nature of order placement with batch or stream processing, when the key differentiator is the need for ACID compliance in a single, discrete operation.

How to eliminate wrong answers

Option B (Analytical processing) is wrong because it focuses on querying and aggregating historical data for reporting and trend analysis, not on real-time, atomic updates of operational data. Option C (Batch processing) is wrong because it processes data in scheduled, bulk intervals (e.g., nightly runs), whereas order placement must happen immediately upon customer action. Option D (Stream processing) is wrong because it handles continuous, unbounded data flows (e.g., sensor data or clickstreams) with low latency, but it does not inherently enforce transactional consistency for individual record updates like inventory deduction.

Full explanation →

250

MCQeasy

A consulting firm collects client information in two forms: a spreadsheet with columns for Name, Address, and Phone Number, and audio recordings of client meetings. Which of the following statements correctly categorizes these data types?

A.Both the spreadsheet data and the audio recordings are examples of structured data.

B.The spreadsheet data is structured, and the audio recordings are semi-structured.

C.The spreadsheet data is structured, and the audio recordings are unstructured.

D.The spreadsheet data is semi-structured, and the audio recordings are unstructured.

AnswerC

Correct. The spreadsheet has a fixed schema (columns) making it structured; audio recordings have no defined schema, making them unstructured.

Why this answer

The spreadsheet data with columns for Name, Address, and Phone Number has a predefined schema (rows and columns), making it structured data. Audio recordings are binary files with no inherent schema or organization, fitting the definition of unstructured data. Option C correctly pairs these classifications.

Exam trap

The trap here is confusing semi-structured data (e.g., JSON, XML with tags) with unstructured data (e.g., audio, video, images), leading candidates to incorrectly classify audio recordings as semi-structured because they contain metadata, but the content itself is unstructured.

How to eliminate wrong answers

Option A is wrong because audio recordings are not structured; they lack a fixed schema and cannot be easily queried with SQL. Option B is wrong because audio recordings are unstructured, not semi-structured (semi-structured data has tags or markers, like JSON or XML). Option D is wrong because the spreadsheet data is structured, not semi-structured; it has a rigid schema with defined columns and data types.

Full explanation →

251

MCQmedium

A financial company is migrating a 2-TB on-premises SQL Server database to Azure. The database uses SQL Server Agent jobs for data validation and cleanup, and it performs cross-database queries using three-part names (e.g., DB1.schema.table). The company requires a fully managed PaaS service that supports these features with minimal application changes. Which Azure SQL service should they choose?

A.Azure SQL Database (single database)

B.Azure SQL Managed Instance

C.Azure SQL Database elastic pool

D.SQL Server on Azure Virtual Machine

AnswerB

Correct. SQL Managed Instance offers SQL Agent, cross-database queries, and is fully managed, minimizing code changes.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near-100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs and cross-database queries using three-part names (e.g., DB1.schema.table). As a fully managed PaaS service, it minimizes application changes while offloading infrastructure management, making it ideal for migrating a 2-TB database with these specific requirements.

Exam trap

The trap here is that candidates often confuse Azure SQL Database elastic pool with Managed Instance, assuming elastic pools support all SQL Server features, when in fact they only scale resources across single databases and lack instance-scoped features like Agent jobs and cross-database three-part name queries.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (single database) does not support SQL Server Agent jobs or cross-database queries using three-part names; it only supports elastic query for cross-database access, which requires significant application changes. Option C is wrong because Azure SQL Database elastic pool is a resource-sharing model for multiple single databases, inheriting the same limitations as single databases (no Agent jobs, no three-part name cross-database queries). Option D is wrong because SQL Server on Azure Virtual Machine is an IaaS solution, not a fully managed PaaS service; it requires manual patching, backups, and high-availability setup, contradicting the requirement for minimal management overhead.

Full explanation →

252

Multi-Selecteasy

A company is choosing a non-relational data store for a new application that requires flexible schema, high availability, and low latency across multiple geographic regions. Which TWO Azure services meet these requirements?

Select 2 answers

A.Azure Files

B.Azure SQL Database

C.Azure Cache for Redis

D.Azure Cosmos DB

E.Azure Table Storage

AnswersD, E

Supports multi-region writes, flexible schema, and low latency.

Why this answer

Azure Cosmos DB (option A) offers multi-region replication, flexible schema, and low latency. Azure Table Storage (option D) is a NoSQL key-value store with global replication (read-access geo-redundant storage) and low latency. Option B (Azure SQL Database) is relational.

Option C (Azure Files) is file storage. Option E (Azure Cache for Redis) is an in-memory cache, not a primary data store.

Full explanation →

253

Matchingmedium

Match each data processing term to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Extract, Transform, Load

Extract, Load, Transform

Processing large volumes of data at scheduled intervals

Processing data in real-time as it arrives

Online Transaction Processing

Why these pairings

These are fundamental data processing concepts in Azure.

Full explanation →

254

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table containing 5 TB of sales transactions. New data arrives continuously and is loaded daily. The company needs to load 500 GB of new data each day while allowing concurrent read queries on the most recent data without performance degradation. Which loading strategy optimizes both load speed and query performance?

A.Use INSERT statements to add rows incrementally

B.Use CREATE TABLE AS SELECT (CTAS) to build a new table and rename it

C.Load data into a staging table, then use partition switching to swap the latest partition

D.Use PolyBase to load data directly into the fact table

AnswerC

Partition switching is a fast, metadata-only operation that adds new data without blocking concurrent reads, ideal for large daily loads.

Why this answer

Option C is correct because partition switching allows you to load new data into a staging table, then instantly swap the staging partition with the target table's latest partition using ALTER TABLE SWITCH. This minimizes metadata changes and avoids data movement, enabling fast loads while keeping the fact table online for concurrent read queries without blocking or performance degradation.

Exam trap

The trap here is that candidates often choose PolyBase (Option D) because it is associated with fast data loading, but they overlook that direct loading into a large fact table causes fragmentation and locking, whereas partition switching provides both speed and query isolation.

How to eliminate wrong answers

Option A is wrong because INSERT statements for 500 GB of data would generate excessive transaction log overhead, cause locking and blocking, and degrade concurrent read query performance on the dedicated SQL pool. Option B is wrong because CTAS creates a full copy of the entire 5 TB table plus the new data, which is resource-intensive, slow, and unnecessary for daily incremental loads; it also requires renaming and dropping the old table, causing downtime. Option D is wrong because PolyBase loads data directly into the fact table, which can cause fragmentation, locking, and poor query performance during the load, and it does not isolate the new data for efficient partition management.

Full explanation →

255

MCQhard

A healthcare organization stores patient records in Azure Blob Storage and must comply with data retention policies that require deleting records after 7 years. They also need to prevent any modification or deletion of records before the retention period ends. Which Azure feature should they use?

A.Immutable storage with time-based retention policy

B.Azure Backup for Blob Storage

C.Soft delete for Blob Storage

D.Azure Blob Storage lifecycle management

AnswerA

Immutable storage ensures blobs cannot be modified or deleted until the retention period ends.

Why this answer

Immutable storage with a time-based retention policy (WORM – Write Once, Read Many) ensures that blobs cannot be modified or deleted until the retention period expires. This directly meets the dual requirement of preventing premature deletion while enforcing a 7-year retention, as the policy locks the data for the specified duration.

Exam trap

The trap here is that candidates confuse soft delete (which only protects against accidental deletion) or lifecycle management (which automates tiering/expiry) with the strict WORM guarantee required for regulatory compliance, where no modification or deletion is allowed before the retention period ends.

How to eliminate wrong answers

Option B (Azure Backup for Blob Storage) is wrong because it provides point-in-time recovery and protection against accidental deletion, but it does not prevent intentional modification or deletion of the original blobs before the retention period ends. Option C (Soft delete for Blob Storage) is wrong because it only retains deleted blobs for a configurable period (e.g., 7 days) and allows recovery, but it does not block deletion or modification during the retention period. Option D (Azure Blob Storage lifecycle management) is wrong because it automates tiering or deletion based on age, but it cannot enforce a write-once, read-many lock to prevent modification or deletion before the retention period expires.

Full explanation →

256

MCQhard

A manufacturing company collects sensor data from thousands of IoT devices. The data arrives as a stream of time-stamped readings with a fixed schema (DeviceID, Timestamp, Temperature, Pressure, Vibration). They need to store this data and support both real-time dashboards showing the last hour of data and complex analytical queries over years of historical data. The solution must minimize storage costs and provide sub-second response for real-time queries. Which Azure service is best suited for this workload?

A.Azure Cosmos DB with SQL API

B.Azure SQL Database

C.Azure Data Explorer

D.Azure Table Storage

AnswerC

Azure Data Explorer is specifically built for time-series and log analytics. It supports high-throughput ingestion, automatic indexing, caching for hot data (sub-second queries), and retention-based tiering to cold storage for historical analysis, minimizing costs.

Why this answer

Azure Data Explorer (ADX) is purpose-built for high-performance analysis of large volumes of streaming telemetry data. It supports ingestion from IoT hubs, automatic indexing for sub-second queries on recent data (e.g., last hour), and cost-effective long-term storage via hot/cold tiering for years of historical analytics. Its columnar storage and Kusto Query Language (KQL) are optimized for time-series and aggregation queries, making it ideal for this mixed real-time and historical workload.

Exam trap

Microsoft often tests the misconception that any database with low-latency reads (like Cosmos DB) can handle both real-time and historical analytics, but the trap is that Cosmos DB lacks the columnar storage and query engine optimized for time-series aggregations, making it cost-prohibitive and slow for complex analytical queries over years of data.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB with SQL API is a NoSQL document database optimized for transactional workloads with low-latency reads/writes, but it is not designed for complex analytical queries over years of historical data and its storage costs are significantly higher than ADX for large telemetry volumes. Option B is wrong because Azure SQL Database is a relational OLTP engine that provides strong consistency and indexing, but it struggles with sub-second response on streaming time-series data at scale and its storage costs are higher for high-ingestion-rate telemetry. Option D is wrong because Azure Table Storage is a simple key-value store with no native support for time-series analytics, complex aggregations, or sub-second query performance on streaming data, and it lacks indexing for efficient range queries over timestamps.

Full explanation →

257

MCQmedium

A retail application uses Azure SQL Database. The Products table contains 200,000 rows with columns: ProductID (primary key, clustered), CategoryID, ProductName, Price, StockQuantity. Queries frequently filter on CategoryID and then sort results by Price in descending order. Which indexing strategy will most improve query performance for these operations?

A.Create a clustered index on CategoryID.

B.Create a nonclustered index on CategoryID that includes Price as an included column.

C.Create a nonclustered index on (CategoryID, Price) with Price in descending order.

D.Create a clustered columnstore index on the table.

AnswerC

Correct. This composite index supports both the filter on CategoryID and the descending sort on Price without additional steps.

Why this answer

Option C creates a composite nonclustered index on (CategoryID, Price DESC) that directly supports both the filter (CategoryID equality) and the sort (Price descending) in a single index seek and ordered scan, eliminating the need for a separate sort operation. This is the most efficient strategy because the index is ordered exactly as the query requires, allowing SQL Server to retrieve matching rows in the correct order without additional processing.

Exam trap

Microsoft often tests the misconception that including Price as an included column (Option B) is sufficient to optimize the sort, when in fact the index must be ordered by Price to avoid a separate sort operation.

How to eliminate wrong answers

Option A is wrong because changing the clustered index to CategoryID would reorganize the entire table by CategoryID, which may fragment the data and does not directly optimize the sort by Price descending; the clustered index should remain on the primary key for uniqueness and row lookup efficiency. Option B is wrong because while it includes Price as an included column, the index is ordered only by CategoryID, so SQL Server would still need to sort the matching rows by Price after the seek, adding a costly Sort operator to the execution plan. Option D is wrong because a clustered columnstore index is designed for large-scale analytical workloads with aggregations and scans, not for point lookups or ordered retrieval on a 200,000-row table; it would degrade performance for the described transactional queries.

Full explanation →

258

MCQmedium

Refer to the exhibit. You are reviewing an ARM template that deploys a SQL database in Azure Synapse. The template sets the storageAccountType to GRS. What is a valid concern regarding cost and performance?

A.GRS will increase storage costs and may cause higher latency

B.The collation setting is not compatible with Azure Synapse

C.The database cannot be part of a failover group

D.The database will not support Transparent Data Encryption

AnswerA

GRS replicates data to a secondary region, increasing cost and potentially write latency.

Why this answer

Geo-redundant storage (GRS) replicates your data to a secondary region, which increases storage costs because you are paying for both the primary and secondary copies. Additionally, when using GRS with Azure Synapse SQL, read requests may experience higher latency if they are directed to the secondary region, especially during a failover scenario or when using read-access geo-redundant storage (RA-GRS). This makes cost and performance valid concerns when choosing GRS over locally redundant storage (LRS).

Exam trap

The trap here is that candidates often assume GRS only affects disaster recovery and ignore its impact on ongoing storage costs and read latency, leading them to dismiss cost and performance as valid concerns.

How to eliminate wrong answers

Option B is wrong because the collation setting is not inherently incompatible with Azure Synapse; Synapse SQL pools support a variety of collations, and the default SQL_Latin1_General_CP1_CI_AS is commonly used. Option C is wrong because Azure Synapse SQL databases can be part of a failover group when configured appropriately, though the failover group feature is more commonly associated with Azure SQL Database; the GRS setting does not prevent failover group membership. Option D is wrong because Transparent Data Encryption (TDE) is supported in Azure Synapse SQL pools regardless of the storage replication type (GRS, LRS, etc.), as TDE operates at the database level and is independent of storage redundancy.

Full explanation →

259

Multi-Selectmedium

A company is designing a data solution for a retail application. The solution must support real-time analytics on streaming sales data, and also provide historical reports for business intelligence. Which TWO data processing models should be combined to meet these requirements?

Select 2 answers

A.Distributed processing

B.Batch processing

C.Data lake storage

D.Transactional database

E.Stream processing

AnswersB, E

Batch processing is used for periodic processing of large volumes of historical data, suitable for business intelligence reports.

Why this answer

Batch processing (B) is correct because it is used to process large volumes of historical sales data at scheduled intervals, enabling the generation of comprehensive business intelligence reports. Stream processing (E) is correct because it handles real-time data ingestion and analytics on streaming sales data, allowing the application to react instantly to sales events. Combining these two models (often called a Lambda architecture) meets both the real-time and historical reporting requirements.

Exam trap

The trap here is that candidates confuse 'distributed processing' (a general architecture) with a specific processing model, or they mistakenly think a transactional database can handle real-time analytics on streaming data, when in fact it is optimized for single-row transactions, not continuous data streams.

Full explanation →

260

MCQhard

A financial services company has raw transaction data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files, partitioned by date. The analytics team needs to run complex SQL queries that join multiple datasets, including reference data from an Azure SQL Database, to generate risk reports. They require enterprise-grade security features such as row-level security (RLS) and column-level security. They also want to use the same service for data transformation and loading (ETL) into a curated layer. Which Azure service should they choose?

A.Azure Synapse Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Analysis Services

AnswerA

Correct. Azure Synapse Analytics offers a unified experience for data integration, enterprise data warehousing, and big data analytics, with built-in security features like RLS and column-level security. It can query ADLS Gen2 using serverless SQL pool and orchestrate ETL with pipelines.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that combines enterprise data warehousing with big data analytics. It directly supports complex SQL queries across multiple datasets (including Parquet files in ADLS Gen2 and Azure SQL Database), offers built-in row-level security (RLS) and column-level security for enterprise-grade access control, and includes a built-in pipeline orchestration engine (via Synapse Pipelines) for ETL/ELT transformations into a curated layer. This single service eliminates the need to stitch together separate tools for querying, security, and data transformation.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as a complete analytics solution because of its ETL capabilities, overlooking that it lacks a native SQL query engine and built-in row/column-level security for direct data access.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is primarily a cloud-based ETL and data integration service that orchestrates data movement and transformation, but it does not provide a native SQL query engine for complex analytical queries or built-in row-level/column-level security on the data itself. Option C (Azure Databricks) is wrong because while it excels at big data processing and machine learning using Apache Spark, it does not natively support enterprise-grade row-level security (RLS) and column-level security at the storage or query layer without additional configuration, and its primary interface is not SQL-first for complex joins across relational and file-based sources. Option D (Azure Analysis Services) is wrong because it is a semantic modeling and BI engine that provides tabular models with RLS, but it is not designed for direct ETL/ELT data transformation or loading into a curated layer, nor does it directly query raw Parquet files in ADLS Gen2 without additional data ingestion steps.

Full explanation →

261

Multi-Selecteasy

Which TWO of the following are characteristics of structured data?

Select 2 answers

A.Data uses tags or markers to separate elements

B.Data is organized in rows and columns

C.Data is stored in Azure Cosmos DB

D.Data conforms to a fixed schema

E.Data has no predefined schema

AnswersB, D

Structured data fits neatly into tables.

Why this answer

Structured data is defined by its organization into rows and columns, typically within a relational database or spreadsheet, where each column represents a specific attribute and each row a record. This tabular format enables efficient querying, sorting, and aggregation using SQL. Option B correctly identifies this core characteristic.

Exam trap

The trap here is that candidates confuse the storage location (Azure Cosmos DB) with data structure type, forgetting that Cosmos DB is designed for semi-structured data, not structured data, and that 'tags or markers' (Option A) describe semi-structured formats like JSON or XML, not structured data.

Full explanation →

262

MCQhard

A hospital stores medical images in Azure Blob Storage. They must ensure that images are encrypted at rest using customer-managed keys (CMK) and that access to the keys is audited. What should you implement?

A.Use Azure Disk Encryption to encrypt the storage account.

B.Apply Azure Information Protection labels to the blobs.

C.Enable Azure Storage Service Encryption with a customer-managed key in Azure Key Vault.

D.Use Transparent Data Encryption (TDE) on the storage account.

AnswerC

This provides encryption at rest with CMK and allows auditing via Key Vault logs.

Why this answer

Azure Storage encryption with customer-managed keys stored in Azure Key Vault provides the required control and auditing. Option B is wrong because Azure Disk Encryption is for VMs, not Blob Storage. Option C is wrong because Azure Information Protection is for classification, not encryption at rest.

Option D is wrong because Azure SQL Database TDE is for SQL databases.

Full explanation →

263

MCQmedium

A real-time leaderboard for an online game needs to store player scores and quickly retrieve the top 100 players. The data must update frequently as players achieve new scores, and the application requires sub-millisecond read and write latency. Which Azure data store is best suited for this requirement?

A.Azure Cosmos DB Core (SQL) API

B.Azure Table Storage

C.Azure Cache for Redis

D.Azure Blob Storage

AnswerC

Azure Cache for Redis provides in-memory data structures including sorted sets, enabling high-performance leaderboard operations with sub-millisecond latency. It is purpose-built for such real-time scenarios.

Why this answer

Azure Cache for Redis is an in-memory data store that provides sub-millisecond read and write latency, making it ideal for real-time leaderboards that require frequent updates and fast retrieval of top scores. Its sorted set data structure (ZADD/ZRANGEBYSCORE) allows efficient insertion of player scores and O(log N) retrieval of the top 100 players without disk I/O overhead.

Exam trap

Microsoft often tests the misconception that any low-latency NoSQL store (like Cosmos DB) can match Redis for sub-millisecond, in-memory operations, but the key differentiator is Redis's exclusive sorted set data structure and its dedicated in-memory architecture.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB Core (SQL) API, while fast, typically has single-digit millisecond latencies and is not optimized for the sub-millisecond, in-memory throughput required for real-time leaderboard updates. Option B is wrong because Azure Table Storage is a key-value store with higher latency (10-50 ms) and lacks native sorted set operations, making it unsuitable for frequent score updates and top-N queries. Option D is wrong because Azure Blob Storage is designed for large, unstructured data with high latency (100+ ms) and no support for real-time atomic score updates or ranking queries.

Full explanation →

264

MCQeasy

A mobile gaming startup needs to store player profiles that can have varying attributes (e.g., some players have a 'nickname', others have 'avatar URL'). The application must read a player's profile by PlayerID with very low latency (under 10 ms) from any location worldwide. The data does not require complex queries or joins. Which Azure data store should they choose?

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure Blob Storage

AnswerB

Azure Cosmos DB is a globally distributed NoSQL database that supports flexible schemas and offers low-latency reads (under 10 ms) at any scale. It is the best fit for this scenario.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that guarantees single-digit-millisecond read latencies (under 10 ms) at any scale from any Azure region. Its schema-agnostic nature allows storing player profiles with varying attributes (e.g., nickname, avatar URL) without requiring a fixed schema, and it supports point reads by PlayerID with a consistency model that can be tuned for performance. This directly matches the requirements of low-latency global reads and flexible, non-relational data.

Exam trap

The trap here is that candidates often confuse Azure Table Storage with Cosmos DB Table API, but the question specifies 'Azure Table Storage' (the older, standalone service) which lacks the global distribution and low-latency guarantees of Cosmos DB, leading them to incorrectly choose Option C.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database with a fixed schema, requiring predefined columns for attributes, which does not support varying attributes without complex schema changes or JSON columns that add overhead, and its global read latency is typically higher than 10 ms without additional geo-replication configurations. Option C is wrong because Azure Table Storage is a NoSQL key-value store that can handle varying attributes, but it does not guarantee single-digit-millisecond read latencies globally; its latency is higher (often 10-50 ms) and it lacks the built-in global distribution and low-latency SLAs of Cosmos DB. Option D is wrong because Azure Blob Storage is designed for unstructured binary or text data (e.g., files, images) and is not optimized for low-latency point reads of individual player profiles by ID; it typically has higher latency (tens to hundreds of milliseconds) and does not support querying by PlayerID natively without additional indexing or metadata layers.

Full explanation →

265

MCQmedium

A retail company uses Power BI to create sales reports. The data source is an Azure SQL Database that updates every 15 minutes. The reports must reflect near real-time data without manual refresh. Which Power BI feature should the company use?

A.Use the on-premises data gateway to connect to Azure SQL Database.

B.Import data with scheduled refresh every 15 minutes.

C.Use DirectQuery mode to connect to the Azure SQL Database.

D.Create a Power BI dataflow to transform the data.

AnswerC

DirectQuery sends queries to the source database, providing near real-time data.

Why this answer

DirectQuery mode allows Power BI to query the Azure SQL Database directly without importing data, ensuring that reports reflect the current state of the database each time a report is viewed. Since the database updates every 15 minutes, DirectQuery provides near real-time data without requiring manual or scheduled refresh operations.

Exam trap

The trap here is that candidates often confuse DirectQuery with scheduled refresh, assuming that a 15-minute refresh schedule is sufficient for near real-time needs, but DirectQuery eliminates the refresh interval entirely by querying the source live.

How to eliminate wrong answers

Option A is wrong because the on-premises data gateway is used to connect on-premises data sources to Power BI, but Azure SQL Database is a cloud service that can be accessed directly without a gateway. Option B is wrong because scheduled refresh imports data into the Power BI dataset, which introduces latency and requires manual configuration; even with a 15-minute schedule, the data is only as current as the last import, not near real-time. Option D is wrong because a Power BI dataflow is used for data preparation and transformation in the cloud, not for live querying; it still requires a separate import or DirectQuery connection to serve reports.

Full explanation →

266

MCQhard

A data engineering team is building a batch analytics pipeline. Raw clickstream data is stored as Parquet files in Azure Data Lake Storage Gen2. The team needs to transform the data using Apache Spark (Python code) and then load the results into Azure Synapse Analytics for high-performance reporting. They want to use a serverless compute option for Spark to avoid managing clusters. Which combination of Azure services should they use for the transformation and loading?

A.Use Azure Databricks with a serverless cluster for transformations and load into Azure SQL Database.

B.Use Azure Synapse Analytics serverless Spark pools for transformations and load into the Synapse dedicated SQL pool.

C.Use Azure Data Factory with a Spark activity to run transformations and load into Azure Synapse Analytics.

D.Use Azure HDInsight with Apache Spark for transformations and load into Azure Blob Storage.

AnswerB

Synapse Analytics provides serverless Spark pools that automatically scale and can read from ADLS Gen2. The transformed data can be loaded into the dedicated SQL pool for high-performance queries, all within a single integrated service.

Why this answer

Option B is correct because Azure Synapse Analytics serverless Spark pools provide a serverless compute option for running Apache Spark transformations without managing clusters, and the transformed data can be directly loaded into the Synapse dedicated SQL pool for high-performance reporting. This combination meets all requirements: serverless Spark for transformations, and Synapse dedicated SQL pool for optimized analytics workloads.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics serverless Spark pools (which are serverless) with Azure Data Factory's Spark activity (which requires a managed cluster), or assume that any Spark service (like HDInsight) can be serverless, when only Synapse serverless Spark pools and Databricks serverless clusters offer true serverless compute.

How to eliminate wrong answers

Option A is wrong because Azure Databricks with a serverless cluster is a valid serverless Spark option, but it loads into Azure SQL Database, not Azure Synapse Analytics, which does not provide the high-performance reporting capabilities of a dedicated SQL pool. Option C is wrong because Azure Data Factory with a Spark activity still requires a managed Spark cluster (e.g., HDInsight or Databricks) and does not offer a serverless Spark compute option; Data Factory orchestrates but does not run Spark natively in a serverless manner. Option D is wrong because Azure HDInsight requires explicit cluster management (not serverless) and loads into Azure Blob Storage, which is not a high-performance reporting target like Synapse dedicated SQL pool.

Full explanation →

267

Multi-Selectmedium

Which TWO of the following are true about Azure Cosmos DB?

Select 2 answers

A.The default consistency level is Strong.

B.It uses DTUs to measure performance.

C.It guarantees single-digit millisecond latency for reads and writes at the 99th percentile.

D.It is a relational database management system.

E.It supports multiple data models including document, key-value, graph, and column-family.

AnswersC, E

Cosmos DB provides low-latency guarantees.

Why this answer

Azure Cosmos DB is a globally distributed, multi-model database. Option A is correct: it supports multiple APIs including SQL, MongoDB, Cassandra, etc. Option D is correct: it offers guaranteed single-digit millisecond latency at the 99th percentile.

Option B is wrong because Cosmos DB does not support the relational model natively; it uses non-relational models. Option C is wrong because Cosmos DB uses provisioned throughput (RU/s), not DTUs. Option E is wrong because the default consistency level is Session, not Strong.

Full explanation →

268

MCQhard

A data analyst needs to create an interactive report that combines sales data from Azure SQL Database and Azure Cosmos DB. The report must refresh daily. Which tool should they use?

A.Azure Data Factory

B.Azure Synapse Studio

C.Azure Analysis Services

D.Power BI

AnswerD

Power BI can connect to multiple sources and create interactive dashboards with scheduled refresh.

Why this answer

Power BI is the correct tool because it is designed for creating interactive reports and dashboards, and it can directly connect to both Azure SQL Database and Azure Cosmos DB as data sources. Its scheduled refresh capability allows the report to refresh daily without manual intervention, meeting the requirement for an interactive, combined report.

Exam trap

The trap here is that candidates may confuse data integration tools (like Azure Data Factory) or data modeling services (like Azure Analysis Services) with the actual reporting and visualization tool, which is Power BI, the only option that directly creates interactive reports with scheduled refresh.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data integration service, not a reporting or visualization tool; it would be used to move or transform data before reporting, but not to create the interactive report itself. Option B is wrong because Azure Synapse Studio is an analytics workspace for big data and data warehousing, not a dedicated interactive reporting tool; while it can query data, it lacks the rich visualization and dashboard features of Power BI. Option C is wrong because Azure Analysis Services is a semantic modeling engine that provides analytical data models, but it does not create interactive reports; it would typically serve as a data source for Power BI, not replace it.

Full explanation →

269

MCQmedium

A company uses Azure SQL Database to store a large table of sales transactions with columns: TransactionID (primary key), CustomerID, ProductID, SaleDate, Amount. Queries frequently filter by both CustomerID and SaleDate to retrieve sales for a specific customer over a date range. Which indexing strategy will most improve query performance?

A.Create a clustered index on SaleDate

B.Create a nonclustered index on CustomerID and include SaleDate

C.Create a nonclustered index on (CustomerID, SaleDate)

D.Create a nonclustered index on (SaleDate, CustomerID)

AnswerC

Correct. This composite index with CustomerID first and SaleDate second supports both equality on CustomerID and range filtering on SaleDate efficiently.

Why this answer

Option C is correct because creating a nonclustered index on (CustomerID, SaleDate) as a composite index directly supports the query predicate that filters by both CustomerID and SaleDate. The index is ordered by CustomerID first, enabling efficient seeks for a specific customer, and then by SaleDate within each customer, allowing the query engine to perform a range scan for the date range without scanning the entire table or sorting. This index is a covering index for this query, as it contains all columns needed for the filter, avoiding key lookups.

Exam trap

The trap here is that candidates often choose Option D (SaleDate, CustomerID) thinking the date range should be first, but they overlook that the equality filter on CustomerID should be the leading column to enable a seek, not a scan.

How to eliminate wrong answers

Option A is wrong because a clustered index on SaleDate alone would order the entire table by SaleDate, which does not help with equality filtering on CustomerID; the query would still need to scan all rows for a specific customer across all dates. Option B is wrong because a nonclustered index on CustomerID with SaleDate as an included column does not order the index by SaleDate, so the query cannot efficiently perform a range scan on SaleDate; it would require scanning all rows for that customer and then filtering by date. Option D is wrong because a nonclustered index on (SaleDate, CustomerID) orders by SaleDate first, which is less selective for queries that filter by CustomerID first; the query would need to scan multiple date ranges to find the specific customer, reducing performance.

Full explanation →

270

MCQmedium

A marketing company collects real-time clickstream data from their website using Azure Event Hubs. They need to perform two tasks: (1) aggregate the number of clicks per advertising campaign every 5 minutes and display the results in a live dashboard, and (2) run complex historical queries on months of aggregated click data to identify trends. They want to minimize data movement and use serverless compute where possible. Which combination of Azure services should they use?

A.Azure Stream Analytics for live aggregation and Power BI for the dashboard; Azure Synapse Analytics (serverless SQL pool) for historical queries

B.Azure Data Factory for live aggregation; Azure Analysis Services for historical queries

C.Azure HDInsight (Spark) for both live and historical processing

D.Azure Functions for real-time aggregation; Azure SQL Database for historical queries

AnswerA

Correct. Stream Analytics processes streaming data in real-time with 5-minute tumbling windows and outputs to Power BI. Synapse serverless SQL pool can query Parquet files stored in the data lake for historical analysis.

Why this answer

Azure Stream Analytics is ideal for real-time aggregation of clickstream data from Event Hubs, outputting to Power BI for a live dashboard. Azure Synapse Analytics serverless SQL pool allows querying months of aggregated data stored in Azure Data Lake Storage without provisioning compute, minimizing data movement and using serverless compute.

Exam trap

The trap here is confusing batch processing tools like Azure Data Factory or HDInsight with real-time stream processing, and overlooking that Azure Synapse serverless SQL pool is the serverless option for historical queries, not Azure SQL Database.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory is an orchestration and ETL tool, not a real-time stream processing engine; it cannot perform live aggregation. Option C is wrong because HDInsight (Spark) requires provisioning and managing clusters, which does not minimize data movement or use serverless compute as efficiently as the chosen combination. Option D is wrong because Azure Functions are stateless and not designed for continuous real-time stream aggregation; Azure SQL Database is not serverless for historical queries and requires provisioning.

Full explanation →

271

MCQmedium

A hospital uses Azure SQL Database to store patient appointment records. The 'Appointments' table has columns: AppointmentID (primary key), PatientID, DoctorID, AppointmentDate, and Status. Queries frequently filter by DoctorID (equality) and AppointmentDate (range) to retrieve a doctor's schedule. Currently, these queries are slow. Which index strategy will most improve performance for these queries?

A.Add a clustered index on AppointmentID.

B.Add a nonclustered index on (DoctorID, AppointmentDate).

C.Add a columnstore index on the Status column.

D.Add a nonclustered index on (AppointmentDate, DoctorID).

AnswerB

This composite index allows the query to first seek on the equality column (DoctorID) and then efficiently scan the range of AppointmentDate within that doctor's rows.

Why this answer

Option B is correct because a nonclustered index on (DoctorID, AppointmentDate) supports both equality filtering on DoctorID and range filtering on AppointmentDate. This index structure allows SQL Server to perform a single index seek for the doctor, then a range scan within that doctor's appointments, avoiding a full table scan. The order of columns matters: the leading column (DoctorID) handles the equality predicate, and the second column (AppointmentDate) handles the range predicate efficiently.

Exam trap

The trap here is that candidates often think the date column should be first because it's a range query, but the correct strategy is to place the equality column first to minimize the scan range, then the range column second for efficient filtering.

How to eliminate wrong answers

Option A is wrong because adding a clustered index on AppointmentID (the primary key) does not help queries filtering by DoctorID and AppointmentDate; it only speeds up lookups by AppointmentID. Option C is wrong because a columnstore index is designed for large-scale analytical queries and aggregations, not for point lookups or range scans on transactional tables; it would not improve the performance of these specific queries. Option D is wrong because the index on (AppointmentDate, DoctorID) would require scanning all appointments in the date range before filtering by DoctorID, which is less efficient than leading with the equality column (DoctorID) to narrow the search space first.

Full explanation →

272

MCQhard

A data analyst is using Azure Databricks to transform streaming data from Event Hubs. They need to ensure that if a failure occurs, the streaming job can resume processing from the last committed offset. Which checkpointing mechanism should they configure?

A.Use Structured Streaming with checkpointing to Azure Data Lake Storage Gen2.

B.Enable write-ahead logs on the Event Hubs namespace.

C.Use checkpointing to Hive metastore.

D.Use checkpointing to DBFS (Databricks File System).

AnswerA

Checkpointing to ADLS Gen2 provides fault tolerance.

Why this answer

Structured Streaming in Azure Databricks uses checkpointing to store the current state and offsets of the streaming query in a reliable, external storage system. By configuring checkpointing to Azure Data Lake Storage Gen2, the job can recover from failures and resume processing exactly from the last committed offset, ensuring exactly-once semantics. This is the recommended approach for production streaming workloads on Azure.

Exam trap

The trap here is that candidates often confuse DBFS with persistent storage, but DBFS is cluster-scoped and ephemeral, so checkpointing to DBFS will lose state when the cluster stops, whereas ADLS Gen2 provides durable, external checkpoint storage.

How to eliminate wrong answers

Option B is wrong because write-ahead logs on the Event Hubs namespace are used for Event Hubs internal durability and replication, not for checkpointing the consumer's offset state in Databricks. Option C is wrong because the Hive metastore is designed for storing table metadata and schema information, not for storing streaming checkpoint data or offsets. Option D is wrong because DBFS is a local ephemeral storage tied to the cluster; if the cluster is terminated or fails, checkpoint data stored in DBFS is lost, making it unsuitable for reliable failure recovery.

Full explanation →

273

MCQhard

A company is migrating a SQL Server database to Azure SQL Database. The database uses CLR (Common Language Runtime) integration for business logic and has database mail configured. The company needs full instance-level functionality while still benefiting from the platform-as-a-service model. Which Azure SQL deployment option should they choose?

A.Azure SQL Database (single database)

B.Azure SQL Database elastic pool

C.Azure SQL Managed Instance

D.Azure SQL Database Hyperscale

AnswerC

Azure SQL Managed Instance is a PaaS deployment that offers broad SQL Server compatibility, including CLR integration, database mail, SQL Agent, and other instance-scoped features, while still providing managed services.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with SQL Server on-premises, including support for CLR integration and Database Mail, while still offering a platform-as-a-service (PaaS) model. Single databases and elastic pools lack these instance-scoped features, and Hyperscale is a scaling option for single databases, not a separate deployment type that adds instance-level functionality.

Exam trap

The trap here is that candidates often confuse Azure SQL Database Hyperscale as a separate deployment option that adds instance features, when in fact it is merely a scaling tier for single databases and does not enable CLR or Database Mail.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (single database) does not support CLR integration or Database Mail; it is a fully managed database service that lacks instance-scoped features like SQL Agent and cross-database queries. Option B is wrong because an elastic pool is a collection of single databases sharing resources and inherits the same feature limitations as single databases, so it also cannot run CLR or Database Mail. Option D is wrong because Azure SQL Database Hyperscale is a service tier for single databases that provides high scalability and fast backup/restore, but it does not add instance-level features such as CLR or Database Mail.

Full explanation →

274

MCQeasy

A retail company collects raw clickstream data from its website as JSON files. Data scientists need to run exploratory analytics on this raw data without a predefined schema. BI analysts also need to generate weekly sales reports from aggregated transactional data stored in a relational format. Which combination of data storage approaches best meets these needs?

A.Store raw data in Azure Blob Storage and aggregated data in Azure Cosmos DB

B.Store raw data in Azure Data Lake Storage and aggregated data in Azure SQL Database

C.Store raw data in Azure Table Storage and aggregated data in Azure Data Lake Storage

D.Store raw data in Azure SQL Database and aggregated data in Azure Blob Storage

AnswerB

Azure Data Lake Storage provides a scalable data lake for raw data with schema-on-read, while Azure SQL Database is a relational database ideal for structured transactional data and BI reports.

Why this answer

Azure Data Lake Storage (ADLS) is optimized for storing raw, schema-on-read data like JSON files, enabling data scientists to run exploratory analytics without a predefined schema. Azure SQL Database provides a relational structure with ACID compliance, ideal for BI analysts generating weekly sales reports from aggregated transactional data. This combination directly addresses both unstructured raw data and structured reporting needs.

Exam trap

Microsoft often tests the distinction between storage for raw, schema-less data (ADLS/Blob) versus structured, relational data (Azure SQL Database), and the trap here is that candidates confuse Azure Cosmos DB or Table Storage as suitable for raw data, overlooking that they are NoSQL databases with fixed schemas or key-value limitations, not optimized for exploratory analytics on JSON files.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database designed for low-latency, globally distributed applications, not for cost-effective storage of raw JSON files for exploratory analytics, and it lacks the relational query capabilities needed for BI reports. Option C is wrong because Azure Table Storage is a NoSQL key-value store unsuitable for schema-on-read analytics on raw JSON, and Azure Data Lake Storage is not a relational database for aggregated transactional reporting. Option D is wrong because Azure SQL Database is a relational store that requires a predefined schema, making it inappropriate for raw, schema-less clickstream data, and Azure Blob Storage lacks the relational querying and aggregation features needed for weekly sales reports.

Full explanation →

275

MCQhard

A global social media platform stores user profile images (JPEG) and activity logs in JSON format. The logs have varying structures based on the type of activity. The application requires low-latency reads of images from any region and the ability to query logs using SQL-like syntax. Which Azure data storage solution should they use for each data type?

A.Azure Table Storage for images and Azure Cosmos DB (Table API) for logs

B.Azure Blob Storage with a CDN for images and Azure Cosmos DB (SQL API) for logs

C.Azure Files for images and Azure SQL Database for logs

D.Azure Disk Storage for images and Azure Cosmos DB (MongoDB API) for logs

AnswerB

Blob Storage efficiently stores unstructured images, and CDN ensures low-latency global access. Cosmos DB SQL API provides SQL-like queries for the varying JSON logs.

Why this answer

Azure Blob Storage is optimized for storing large binary objects like JPEG images, and integrating it with Azure CDN ensures low-latency reads globally by caching content at edge nodes. Azure Cosmos DB with the SQL API provides native support for querying JSON documents with varying schemas using SQL-like syntax, making it ideal for the activity logs.

Exam trap

The trap here is that candidates may confuse Azure Table Storage (key-value) with Cosmos DB Table API, or assume Azure SQL Database can handle JSON logs via OPENJSON, but the question explicitly requires SQL-like syntax for varying structures, which Cosmos DB SQL API handles natively without schema enforcement.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a key-value store designed for structured data, not for large binary files like images, and it does not support SQL-like queries for JSON logs. Option C is wrong because Azure Files is a file share service for SMB protocols, not optimized for high-throughput image delivery with CDN, and Azure SQL Database is a relational store that requires a fixed schema, making it unsuitable for logs with varying structures. Option D is wrong because Azure Disk Storage provides block-level storage for VMs, not a globally distributed object store for images, and Cosmos DB with MongoDB API uses MongoDB query syntax, not SQL-like syntax.

Full explanation →

276

MCQmedium

A company uses Azure SQL Database for an e-commerce platform. The Orders table has columns: OrderID (primary key, clustered), CustomerID, OrderDate, TotalAmount. Queries frequently filter on CustomerID and a range of OrderDate, and then sort the results by OrderDate in descending order. The queries also return the TotalAmount column. Which indexing strategy will most improve query performance for these operations?

A.Create a nonclustered index on (CustomerID, OrderDate DESC) with included column TotalAmount

B.Create a nonclustered index on (OrderDate DESC, CustomerID) with included column TotalAmount

C.Change the clustered index to be on (CustomerID, OrderDate DESC)

D.Create a nonclustered index on (OrderDate DESC) without including TotalAmount

AnswerA

This index directly supports the filter on CustomerID and the range/order on OrderDate, and includes TotalAmount to avoid key lookups, making it the most efficient.

Why this answer

Option A is correct because it creates a covering index that matches the query's filter predicates (CustomerID equality, OrderDate range) and sort order (OrderDate DESC). By including TotalAmount as an included column, the index fully satisfies the query without needing to access the clustered index (key lookup), minimizing I/O and improving performance.

Exam trap

The trap here is that candidates often choose an index with the sort column first (Option B) or forget to include the non-key column (Option D), not realizing that covering indexes with the correct key order eliminate expensive key lookups and sorts.

How to eliminate wrong answers

Option B is wrong because the leading column is OrderDate, which is less selective than CustomerID for equality filters, making the index less efficient for the primary filter on CustomerID. Option C is wrong because changing the clustered index to (CustomerID, OrderDate DESC) would require rebuilding the table and could impact other queries that rely on the current OrderID clustered index, and it would not eliminate key lookups for TotalAmount. Option D is wrong because it does not include TotalAmount, forcing key lookups to retrieve that column, and the index order (OrderDate DESC) does not support the equality filter on CustomerID efficiently.

Full explanation →

277

MCQhard

A social media startup stores user profile data, posts, and comments in Azure Cosmos DB. They notice that the logical partition size for a popular user's profile is growing beyond 20 GB, causing performance issues. The current partition key is 'userId'. Which action should they take to solve this?

A.Change the partition key to a synthetic key combining userId and postId

B.Increase the RU/s

C.Split the container into multiple containers by userId range

D.Use a different API like MongoDB

AnswerA

A synthetic key like 'userId_postId' ensures that each post gets its own logical partition, preventing any single partition from exceeding 20 GB.

Why this answer

A is correct because the logical partition size limit in Azure Cosmos DB is 20 GB. By using a synthetic partition key that combines 'userId' and 'postId', you distribute the data for the popular user across multiple logical partitions, preventing any single partition from exceeding the 20 GB limit and resolving the performance bottleneck.

Exam trap

The trap here is that candidates often confuse throughput (RU/s) scaling with storage limits, thinking that increasing RU/s will fix a partition size issue, when in fact the 20 GB logical partition limit is a hard storage constraint that requires partition key redesign.

How to eliminate wrong answers

Option B is wrong because increasing the RU/s only improves throughput (request rate) but does not solve the underlying issue of a single logical partition exceeding the 20 GB storage limit, which causes throttling and performance degradation. Option C is wrong because splitting the container by 'userId' range does not help; the problem is that one specific user's data is too large, and splitting by range would still place all that user's data in one partition. Option D is wrong because changing the API (e.g., to MongoDB) does not alter the Cosmos DB logical partition size limit of 20 GB; the same storage constraint applies regardless of the API used.

Full explanation →

278

MCQmedium

A company uses Azure SQL Database for a financial application. Regulatory compliance requires that database backups be retained for 7 years. The current configuration uses the default point-in-time restore (PITR) retention of 7 days. Which Azure SQL Database feature should the company enable to meet the 7-year retention requirement?

A.Long-term retention (LTR) for backups

B.Active geo-replication

C.Auto-failover groups

D.Geo-redundant backup storage

AnswerA

Correct. LTR allows keeping full database backups for years, meeting the 7-year compliance requirement.

Why this answer

Azure SQL Database's default point-in-time restore (PITR) retains backups for only 7 days, which is insufficient for the 7-year regulatory requirement. Long-term retention (LTR) allows you to retain full database backups for up to 10 years by configuring backup policies in the Azure portal or via T-SQL, meeting the compliance need.

Exam trap

The trap here is that candidates confuse geo-redundant storage (which improves durability) with long-term retention (which extends the retention period), leading them to pick Option D instead of A.

How to eliminate wrong answers

Option B is wrong because active geo-replication provides continuous data replication to a secondary region for disaster recovery, not extended backup retention. Option C is wrong because auto-failover groups manage automatic failover between primary and secondary databases for high availability, not backup retention. Option D is wrong because geo-redundant backup storage (RA-GRS) replicates backups to a paired region for durability but does not extend the retention period beyond the default 7-day PITR window.

Full explanation →

279

MCQmedium

A company runs a global e-commerce application on Azure SQL Database. The application has a read-intensive workload with millions of users querying product details simultaneously. The database is experiencing high read latency during peak hours due to the volume of concurrent read requests. The company wants to scale read performance without changing the application code and without affecting write operations. Which Azure SQL Database feature should they implement?

A.Active geo-replication

B.Elastic pools

C.In-memory OLTP

D.Columnstore indexes

AnswerA

Active geo-replication allows you to create up to four readable secondary databases in the same or different regions. Application read queries can be directed to these secondaries, distributing the read load and improving performance without modifying application logic.

Why this answer

Active geo-replication creates readable secondary replicas of the Azure SQL Database in different Azure regions. By configuring read-only routing to these secondaries, the application can offload read queries from the primary database, scaling read performance without any code changes and without impacting write operations on the primary.

Exam trap

The trap here is that candidates often confuse Active geo-replication with failover groups or assume that In-memory OLTP can solve read latency, but the key requirement is scaling read performance without code changes, which only readable secondaries can achieve.

How to eliminate wrong answers

Option B is wrong because Elastic pools are designed to manage and share resources among multiple databases with varying usage patterns, not to offload read traffic from a single database. Option C is wrong because In-memory OLTP accelerates transaction processing by storing tables in memory, but it does not create separate read replicas to handle concurrent read queries. Option D is wrong because Columnstore indexes improve analytical query performance on large datasets, but they do not provide additional read capacity or offload read traffic from the primary database.

Full explanation →

280

MCQmedium

A data engineer needs to load 500 GB of CSV files from an on-premises server into Azure Data Lake Storage Gen2 daily. The data must be transferred securely over the internet. Which Azure tool should they use?

A.Azure Data Factory

B.Azure PowerShell

C.Azure Import/Export service

D.AzCopy

AnswerD

AzCopy is optimized for copying data to Azure Storage over the network with high performance.

Why this answer

AzCopy is the correct tool because it is a command-line utility designed for high-performance, secure copying of data to and from Azure Blob Storage and Azure Data Lake Storage Gen2. It supports the required 500 GB daily transfer over the internet using HTTPS encryption, and can be scripted for automation without the overhead of a full orchestration service.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as the default tool for any data movement, overlooking that AzCopy is the lightweight, purpose-built utility for direct, scriptable bulk transfers without orchestration overhead.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service, not a direct data transfer tool; it adds unnecessary complexity and cost for a simple bulk copy task, and is not optimized for single-shot, high-volume transfers like AzCopy. Option B is wrong because Azure PowerShell is a scripting environment for managing Azure resources, not a dedicated data transfer tool; it lacks the parallelization and resume capabilities needed for efficient 500 GB file transfers. Option C is wrong because Azure Import/Export service is designed for physical shipment of hard drives to Azure datacenters, not for transferring data over the internet; it is intended for very large datasets (terabytes to petabytes) where network transfer is impractical.

Full explanation →

281

MCQmedium

A gaming company stores player profiles as JSON documents. Each profile includes standard fields like playerId, username, and email, as well as optional fields such as achievements, gamePreferences, and friendsList. The application needs to look up profiles by playerId with low latency (under 10 ms) and also run SQL-like queries to find players who have a specific achievement. Which Azure Cosmos DB API should they choose?

A.A. Table API

B.B. Cassandra API

C.C. MongoDB API

D.D. SQL (Core) API

AnswerD

The SQL API stores JSON documents and supports querying with a SQL dialect. Point reads by partition key (playerId) are low-latency, and SQL queries can easily filter on optional fields like achievements. This makes it the best choice.

Why this answer

The SQL (Core) API is the correct choice because it natively supports JSON documents with flexible schemas (including optional fields like achievements) and provides low-latency point reads by playerId (partition key) under 10 ms. It also enables SQL-like queries (e.g., SELECT * FROM c WHERE ARRAY_CONTAINS(c.achievements, 'specificAchievement')) to find players with a specific achievement, which aligns directly with the requirement.

Exam trap

The trap here is that candidates often choose the MongoDB API because it is associated with JSON documents, but they overlook the explicit requirement for SQL-like queries, which only the SQL (Core) API supports natively among the Azure Cosmos DB APIs.

How to eliminate wrong answers

Option A (Table API) is wrong because it is designed for key-value and tabular data with a fixed schema, not for JSON documents with nested optional fields like achievements or friendsList, and it lacks native SQL-like querying for array containment. Option B (Cassandra API) is wrong because it uses CQL (Cassandra Query Language) and a wide-column store model, which does not natively support JSON document structures or SQL-like queries for array elements; it also requires a predefined schema. Option C (MongoDB API) is wrong because while it supports JSON documents and flexible schemas, it uses MongoDB's query language (e.g., db.collection.find({achievements: 'specificAchievement'})) rather than SQL-like queries, and the question explicitly requires SQL-like query capability.

Full explanation →

282

MCQmedium

Your organization uses Azure Cosmos DB for a real-time inventory application. The data includes a container with items that have a `category` property. The operations team frequently queries for all items in a specific category. To optimize query performance and minimize request unit (RU) consumption, you decide to implement a materialized view. Which Azure Cosmos DB feature should you use to achieve this?

A.Partition key design

B.Change feed

C.Composite indexes

D.Materialized views (preview)

AnswerD

Azure Cosmos DB materialized views allow you to define a view that is automatically updated and optimized for common queries.

Why this answer

Option D is correct because Azure Cosmos DB's materialized views (preview) feature allows you to pre-join, aggregate, and transform data from a source container into a separate container optimized for specific query patterns, such as filtering by `category`. This reduces RU consumption by avoiding full scans or expensive cross-partition queries, as the view is pre-computed and indexed according to the target query.

Exam trap

The trap here is that candidates confuse the change feed (a reactive stream of changes) with materialized views (a persisted, queryable snapshot), or assume that composite indexes alone can achieve the same pre-computation benefits as a materialized view.

How to eliminate wrong answers

Option A is wrong because partition key design distributes data across physical partitions for scalability and write performance, but it does not create a pre-computed, denormalized copy of data optimized for a specific query pattern; a poorly chosen partition key can even increase RU costs for queries. Option B is wrong because the change feed is a mechanism to capture incremental changes (inserts, updates, deletes) to items in a container, enabling event-driven processing or replication, but it does not itself provide a query-optimized, persisted view of the data. Option C is wrong because composite indexes improve query performance by indexing multiple properties in a specific order, but they do not create a separate, pre-materialized dataset; they still require the query engine to scan indexed data at query time, which may not be as efficient as a materialized view for frequent aggregation or filtering.

Full explanation →

283

MCQhard

A retail company has an Azure SQL Database that handles OLTP transactions for its e-commerce platform. The analytics team needs to run complex reporting queries that join multiple tables (e.g., orders, products, customers) and aggregate millions of rows. These queries are long-running and would negatively impact the performance of the OLTP database if run directly. The company wants to use a separate analytics service that supports T-SQL queries, can scale compute independently, and provides a serverless option to avoid provisioning fixed resources. Which Azure service should they choose?

A.Azure Synapse Analytics (dedicated SQL pool)

B.Azure Analysis Services

C.Azure Databricks

D.Azure SQL Database (creating a secondary replica)

AnswerA

Synapse Analytics provides a dedicated SQL pool with MPP architecture for complex queries on large datasets, supports T-SQL, and can be scaled independently. A serverless option is also available, but the dedicated pool is suited for consistent heavy workloads.

Why this answer

Azure Synapse Analytics (dedicated SQL pool) is the correct choice because it is a cloud-based analytics service that supports T-SQL queries, can scale compute independently from storage, and offers a serverless option (Synapse Serverless SQL pool) that eliminates the need to provision fixed resources. This allows the analytics team to run complex, long-running reporting queries against large datasets without impacting the performance of the OLTP Azure SQL Database.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (a semantic layer) with a T-SQL query engine, or assume that a secondary replica of Azure SQL Database can independently scale compute and handle heavy analytics workloads without performance impact.

How to eliminate wrong answers

Option B (Azure Analysis Services) is wrong because it is a semantic modeling service that uses tabular models and DAX/MDX queries, not T-SQL, and it does not support serverless compute or direct execution of complex T-SQL joins against raw data. Option C (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform that primarily uses Python, Scala, or SQL (Spark SQL), not native T-SQL, and it requires provisioning clusters even with auto-scaling, not a true serverless option for T-SQL workloads. Option D (Azure SQL Database creating a secondary replica) is wrong because while a readable secondary replica can offload read-only queries, it does not support independent compute scaling (it mirrors the primary's compute) and lacks a serverless option for the secondary; it also still uses the same underlying database engine, which may not handle massive aggregation workloads efficiently.

Full explanation →

284

MCQhard

A retail chain captures real-time sales data from point-of-sale (POS) systems as a stream of events. The data is ingested into Azure Event Hubs. Additionally, the company receives daily inventory files in CSV format uploaded to Azure Data Lake Storage Gen2. The analytics team needs to combine the streaming sales data with the batch inventory data to generate near real-time dashboards and run historical reports. They want a single analytics platform that can handle both streaming and batch workloads, and allow querying data directly in the data lake using SQL. Which Azure service should they choose?

A.Azure Synapse Analytics

B.Azure Stream Analytics

C.Azure Data Lake Analytics

D.Azure HDInsight

AnswerA

Correct. Azure Synapse Analytics integrates stream processing (via pipelines and Spark/Stream Analytics), batch processing, and serverless SQL to query data lake files directly, all in one platform.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that natively integrates with Azure Event Hubs for real-time streaming ingestion and Azure Data Lake Storage Gen2 for batch data. Its Synapse SQL engine supports querying data directly in the data lake using T-SQL, enabling near real-time dashboards and historical reports without data movement. This service is designed to handle both streaming and batch workloads in a single workspace, meeting all the stated requirements.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it handles streaming, but they overlook the requirement for a single platform that also supports batch data and direct SQL querying of the data lake, which Stream Analytics cannot do for historical reports.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is a real-time stream processing service that cannot directly query batch data in Data Lake Storage Gen2 using SQL for historical reports; it lacks a unified SQL query layer over both streaming and batch sources. Option C (Azure Data Lake Analytics) is wrong because it is a batch-only analytics service that processes data using U-SQL, not SQL, and does not support real-time streaming ingestion from Event Hubs. Option D (Azure HDInsight) is wrong because it is a managed Hadoop/Spark cluster that requires manual setup and management for both streaming and batch workloads, and it does not provide direct SQL querying of data in the data lake without additional tools like Hive or Spark SQL, making it less integrated and more complex than Synapse Analytics.

Full explanation →

285

MCQmedium

A company uses Azure Synapse Analytics for their data warehouse. They notice that queries against the fact table are slow. The fact table is hash-distributed on OrderID. Most queries filter by CustomerID. What should they do to improve performance?

A.Change to round-robin distribution

B.Change the distribution column to CustomerID

C.Use rowstore instead of columnstore

D.Replicate the fact table to all compute nodes

AnswerB

Aligning distribution with the filter column reduces data movement and improves query performance.

Why this answer

The fact table is hash-distributed on OrderID, but queries filter by CustomerID. This causes data movement across nodes for each query, as the filter column doesn't align with the distribution key. Changing the distribution column to CustomerID ensures that rows for the same CustomerID are co-located on the same compute node, eliminating unnecessary data shuffling and improving query performance.

Exam trap

The trap here is that candidates may think round-robin distribution is a safe default for any slow query, but it ignores the critical principle of aligning distribution keys with query filters to minimize data movement.

How to eliminate wrong answers

Option A is wrong because round-robin distribution distributes data evenly without any logical grouping, which would still cause data movement for filtered queries and likely worsen performance. Option C is wrong because rowstore is optimized for point lookups and small transactions, not for analytical queries on large fact tables; columnstore is already the correct choice for data warehousing workloads. Option D is wrong because replicating the entire fact table to all compute nodes would consume excessive storage and memory, and is only practical for small dimension tables, not large fact tables.

Full explanation →

286

MCQmedium

A company has 15 SQL Server databases, ranging from 50 GB to 200 GB each. The databases experience unpredictable load spikes during the day. They want to migrate to Azure SQL Database to minimize management overhead and reduce costs by allowing databases to share resources, while ensuring each database can burst to higher performance when needed. Which deployment option should they choose?

A.A) Single database with Provisioned throughput tier

B.B) Elastic pool

C.C) SQL Managed Instance

D.D) SQL Server on Azure Virtual Machine

AnswerB

Correct. Elastic pools share resources among multiple databases, allowing each database to automatically scale up to the pool limit during bursts. This optimizes cost and is ideal for databases with unpredictable load patterns.

Why this answer

Elastic pools allow multiple databases to share a fixed pool of resources (DTUs or vCores), which reduces costs by pooling unused capacity and enables each database to automatically burst to higher performance when needed. This matches the company's need to minimize management overhead while handling unpredictable load spikes across 15 databases ranging from 50 GB to 200 GB.

Exam trap

The trap here is that candidates often confuse SQL Managed Instance's high compatibility with the ability to share resources, but Managed Instance does not support elastic pools and instead allocates dedicated resources per instance, making it unsuitable for cost-efficient resource sharing and bursting across multiple databases.

How to eliminate wrong answers

Option A is wrong because a single database with provisioned throughput tier allocates dedicated resources per database, which does not allow sharing resources across databases and would likely increase costs due to over-provisioning for peak loads. Option C is wrong because SQL Managed Instance provides near-100% SQL Server compatibility with isolated resources, not shared resource pooling, and is designed for lift-and-shift scenarios rather than cost-efficient bursting across multiple databases. Option D is wrong because SQL Server on Azure Virtual Machine requires full management of the VM and SQL Server, including patching and backups, which contradicts the goal of minimizing management overhead, and does not natively support resource sharing or bursting across databases.

Full explanation →

287

MCQeasy

A startup is building a new mobile app that will track user fitness activities. They need a relational database to store user profiles, activity logs, and goals. The database must be easy to set up, require minimal administration, and automatically scale during peak usage. The startup has a limited budget and prefers a consumption-based pricing model. Which Azure service should they choose?

A.Azure Database for MySQL serverless

B.SQL Server on Azure Virtual Machines

C.Azure SQL Database serverless

D.Azure SQL Managed Instance

AnswerC

Serverless offers consumption-based pricing and auto-scaling.

Why this answer

Azure SQL Database serverless is the correct choice because it provides a consumption-based pricing model that automatically pauses during inactivity and scales compute resources based on demand, requiring minimal administration. This aligns perfectly with the startup's need for easy setup, minimal administration, automatic scaling during peak usage, and a limited budget.

Exam trap

The trap here is that candidates may confuse 'serverless' with 'PaaS' and choose Azure Database for MySQL serverless (Option A) because it also offers consumption-based pricing, but the question specifies a relational database for user profiles, activity logs, and goals, and Azure SQL Database serverless provides better integration with .NET and other Microsoft technologies commonly used in mobile app backends.

How to eliminate wrong answers

Option A is wrong because Azure Database for MySQL serverless, while consumption-based and serverless, is not a relational database service that natively integrates with the mobile app ecosystem as seamlessly as Azure SQL Database, and it lacks the same level of built-in features for activity logs and goals tracking that SQL Server provides. Option B is wrong because SQL Server on Azure Virtual Machines requires significant administration (patching, backups, scaling) and has a fixed pricing model (pay for provisioned VMs), not consumption-based, making it unsuitable for a startup with limited budget and minimal administration needs. Option D is wrong because Azure SQL Managed Instance is designed for lift-and-shift migrations with full SQL Server compatibility and is provisioned with fixed compute and storage, not consumption-based pricing, and it requires more administration than serverless options.

Full explanation →

288

MCQeasy

A company needs to run complex SQL queries on petabytes of data stored in Azure Data Lake Storage Gen2. They want to pay only for the queries they run and do not want to manage any infrastructure. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Synapse dedicated SQL pool

C.Azure SQL Database

D.Azure HDInsight

AnswerA

Correct - Serverless SQL pool provides pay-per-query, serverless T-SQL querying over data lakes, fulfilling both requirements.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it enables running complex SQL queries directly against data in Azure Data Lake Storage Gen2 without provisioning any infrastructure. It uses a pay-per-query billing model, charging only for the amount of data processed, which aligns with the requirement to pay only for queries run and avoid infrastructure management.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'dedicated' SQL pools in Azure Synapse, assuming both can query Data Lake Storage Gen2, but only the serverless pool offers a pay-per-query model without infrastructure management.

How to eliminate wrong answers

Option B is wrong because Azure Synapse dedicated SQL pool requires provisioning and managing dedicated compute resources (e.g., Data Warehouse Units) with a fixed hourly cost, not a pay-per-query model, and does not meet the 'no infrastructure management' requirement. Option C is wrong because Azure SQL Database is a managed relational database service for transactional workloads, not designed for petabyte-scale analytics on Data Lake Storage Gen2, and it incurs ongoing compute costs regardless of query usage. Option D is wrong because Azure HDInsight requires managing a cluster of virtual machines (e.g., for Spark or Hive) with persistent costs, and does not offer a serverless, pay-per-query model for SQL queries on Data Lake Storage Gen2.

Full explanation →

289

MCQeasy

A hospital system stores patient medical records. Each record includes structured data like patient ID, name, date of birth, and also includes unstructured data like doctor's notes and X-ray images. Which type of data is the doctor's notes?

A.A. Structured data

B.B. Semi-structured data

C.C. Unstructured data

D.D. Relational data

AnswerC

Unstructured data lacks a predefined data model or schema. Doctor's notes are free-form text, making them unstructured.

Why this answer

Doctor's notes are unstructured data because they consist of free-form text that does not follow a predefined data model or schema. Unlike structured data (e.g., patient ID, name) which fits neatly into rows and columns, doctor's notes lack a fixed format and cannot be easily queried using traditional relational database tools without additional processing.

Exam trap

The trap here is that candidates may confuse 'unstructured' with 'semi-structured' because doctor's notes might contain some implicit structure (e.g., date headers), but the key exam distinction is that unstructured data lacks a formal schema or metadata tags, unlike semi-structured data such as JSON or XML.

How to eliminate wrong answers

Option A is wrong because structured data requires a strict schema (e.g., tables with rows and columns), whereas doctor's notes are free-form text. Option B is wrong because semi-structured data (e.g., JSON, XML) has tags or markers to separate elements and enforce hierarchy, but doctor's notes have no such organizational metadata. Option D is wrong because relational data is a subset of structured data organized into tables with defined relationships, which does not apply to free-text notes.

Full explanation →

290

MCQmedium

A company uses Azure SQL Database for an e-commerce application. The Orders table contains columns: OrderID (int, primary key), CustomerID (int), OrderDate (datetime), TotalAmount (decimal). Queries frequently filter by both CustomerID and OrderDate to retrieve orders for a specific customer within a date range. Which indexing strategy will most improve the performance of these queries?

A.Create a clustered index on OrderID.

B.Create a nonclustered index on (CustomerID, OrderDate).

C.Create a nonclustered index on (OrderDate, CustomerID).

D.Create a nonclustered index on TotalAmount.

AnswerB

This index covers the filter columns in the optimal order (equality then range) and enables efficient key lookups for the specific customer and date range.

Why this answer

Option B is correct because a nonclustered index on (CustomerID, OrderDate) directly supports the query filter that uses both columns. The index is ordered by CustomerID first, enabling SQL Server to quickly locate all rows for a specific customer, and then within that customer, the OrderDate column is ordered to efficiently scan the date range. This index covers the query's WHERE clause without needing to scan the entire table.

Exam trap

The trap here is that candidates often choose Option C, thinking that indexing the date column first is better for date range queries, but they overlook that the query filters by a specific customer first, making the customer column the more selective leading key for the index.

How to eliminate wrong answers

Option A is wrong because a clustered index on OrderID does not help filter by CustomerID and OrderDate; it only speeds up lookups by OrderID, leaving the query to scan all rows for the desired customer and date range. Option C is wrong because indexing on (OrderDate, CustomerID) first orders by date, which is less selective than customer; the query filters by a specific customer first, so the index would need to scan many date ranges to find that customer's rows. Option D is wrong because an index on TotalAmount is irrelevant to the query's filter conditions on CustomerID and OrderDate, providing no performance benefit for these queries.

Full explanation →

291

MCQhard

You are analyzing a SQL script for an Azure Synapse Analytics dedicated SQL pool as shown in the exhibit. The table 'SensorData' will contain billions of rows. Which statement about the table design is correct?

A.The table uses a clustered columnstore index, which is ideal for large data warehousing tables

B.The table is replicated across all compute nodes

C.The table uses round-robin distribution

D.The table uses a heap structure

AnswerA

Clustered columnstore indexes are recommended for large tables in Synapse to improve compression and query performance.

Why this answer

A hash distribution on DeviceID distributes rows across distributions based on the hash of DeviceID, which is good for large tables queried frequently by DeviceID. Clustered columnstore index is optimal for large tables in Synapse. Round-robin is for staging tables.

Clustered index is for small tables. The table is not replicated because replication is for small dimension tables.

Full explanation →

292

MCQhard

The exhibit shows an ARM template snippet for deploying an Azure storage account. What is the redundancy level of the storage account?

A.Read-access geo-redundant storage (RA-GRS)

B.Zone-redundant storage (ZRS)

C.Locally redundant storage (LRS)

D.Geo-redundant storage (GRS)

AnswerC

Standard_LRS indicates LRS.

Why this answer

The ARM template snippet does not specify a 'sku.tier' or 'sku.name' property that would indicate geo-replication or zone-redundancy. By default, when no redundancy option is explicitly configured, Azure Storage accounts deploy with Locally redundant storage (LRS), which replicates data three times within a single datacenter in the primary region.

Exam trap

The trap here is that candidates often assume a storage account must have a redundancy level explicitly declared in the template, but Azure defaults to LRS when no 'sku' or 'redundancy' property is present, leading them to incorrectly select GRS or RA-GRS based on assumptions about geo-replication.

How to eliminate wrong answers

Option A is wrong because Read-access geo-redundant storage (RA-GRS) requires explicit configuration of the 'sku.name' property to 'Standard_GRS' and setting 'supportsHttpsTrafficOnly' or enabling read-access via properties, which are absent in the snippet. Option B is wrong because Zone-redundant storage (ZRS) requires the 'sku.name' to be set to 'Standard_ZRS' and the storage account to be deployed in a region supporting availability zones, neither of which is indicated in the snippet. Option D is wrong because Geo-redundant storage (GRS) also requires explicit 'sku.name' of 'Standard_GRS' and is not the default; the snippet shows no such property, so the default LRS applies.

Full explanation →

293

MCQeasy

A company stores customer data in a relational table with columns CustomerID, FullName, and Email. They also store product descriptions as JSON documents with varying fields, and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured, semi-structured, unstructured

B.Unstructured, semi-structured, structured

C.Semi-structured, structured, unstructured

D.Structured, unstructured, semi-structured

AnswerA

Correct. Relational table data is fully structured, JSON is semi-structured, and JPEG images are unstructured.

Why this answer

The customer data in a relational table with fixed columns (CustomerID, FullName, Email) is structured because it has a rigid schema and defined data types. The JSON documents for product descriptions are semi-structured because they use key-value pairs with flexible fields but still have metadata (tags, keys) that provide organization. The JPEG product images are unstructured because they are binary blobs with no inherent schema or metadata that the database can query directly.

This ordering from most to least structured matches option A.

Exam trap

The trap here is that candidates often confuse semi-structured (JSON) with unstructured (JPEG) because both lack a fixed schema, but JSON has inherent key-value structure that databases can query, whereas JPEG is purely binary with no queryable structure.

How to eliminate wrong answers

Option B is wrong because it places unstructured (JPEG) as the most structured, which is incorrect — JPEG files have no schema and are the least structured. Option C is wrong because it places semi-structured (JSON) as the most structured, but relational tables with fixed schemas are more structured than JSON documents. Option D is wrong because it places unstructured (JPEG) in the middle, but JPEG files are the least structured of the three data types.

Full explanation →

294

MCQmedium

You are designing a relational database for an IoT application that ingests high volumes of time-stamped sensor data. The queries frequently filter by device ID and time range. Which index strategy would optimize query performance?

A.Create a non-clustered index on Timestamp only

B.Create a composite index on (DeviceID, Timestamp)

C.Create a clustered index on DeviceID only

D.Create a non-clustered index on SensorType

AnswerB

This index supports filtering by device and time.

Why this answer

Option A is correct because a composite index on (DeviceID, Timestamp) supports both filters efficiently. Option B is wrong because a single index on Timestamp does not help with DeviceID filtering. Option C is wrong because a clustered index on DeviceID only helps if queries filter solely by DeviceID.

Option D is wrong because indexes on non-key columns like SensorType are not helpful for the given query pattern.

Full explanation →

295

MCQmedium

A global e-commerce company uses Azure SQL Database for its product catalog. The database is hosted in the West US region. To ensure the catalog remains available if West US experiences an outage, the company wants to configure a secondary database in East US that can be used for reads and can be automatically promoted to primary during a disaster. They require a Recovery Point Objective (RPO) of less than 5 seconds and a Recovery Time Objective (RTO) of less than 30 minutes. Which feature should they implement?

A.Active geo-replication

B.Auto-failover groups

C.Geo-restore

D.Transactional replication

AnswerB

Auto-failover groups provide automatic failover to a secondary region, include a readable secondary, and meet the RPO of 5 seconds and RTO of 30 minutes.

Why this answer

Auto-failover groups (Option B) are the correct choice because they provide automatic, orchestrated failover of a primary Azure SQL Database to a secondary region (East US) during an outage, meeting the RPO of less than 5 seconds (typically 5–10 seconds for active geo-replication) and RTO of less than 30 minutes (usually under 1 hour). The secondary database can be used for read-only queries, and the failover group ensures the entire group of databases fails over as a unit, maintaining the same connection string.

Exam trap

The trap here is that candidates confuse active geo-replication with auto-failover groups, assuming both provide automatic failover, but only auto-failover groups offer the orchestrated, automatic promotion required for the specified RTO.

How to eliminate wrong answers

Option A (Active geo-replication) is wrong because while it provides a readable secondary database with an RPO of less than 5 seconds, it does not support automatic failover; failover must be initiated manually, which violates the RTO requirement of less than 30 minutes. Option C (Geo-restore) is wrong because it restores a database from a geo-replicated backup with an RPO of 1 hour (not less than 5 seconds) and an RTO that can take hours, failing both the RPO and RTO requirements. Option D (Transactional replication) is wrong because it is designed for one-way or bidirectional data synchronization with higher latency and manual failover, not for automatic disaster recovery with sub-5-second RPO and sub-30-minute RTO.

Full explanation →

296

MCQeasy

A retail company stores product catalog data as JSON documents. Each product has a different set of attributes depending on its category (e.g., electronics have 'voltage', clothing has 'size'). The application needs to query products by category and price range efficiently. Which Azure data store is most appropriate for this workload?

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Table Storage

AnswerA

Correct. Cosmos DB is a NoSQL database that supports schema-flexible JSON documents and provides fast queries on any attribute, ideal for product catalogs with varying attributes.

Why this answer

Azure Cosmos DB is the most appropriate choice because it natively supports JSON documents with flexible schemas, enabling each product to have a different set of attributes per category. Its indexing policies can be configured to efficiently support queries filtering by category and price range, and it offers low-latency, high-throughput access ideal for retail catalog workloads.

Exam trap

The trap here is that candidates often choose Azure SQL Database because they assume all structured data requires a relational store, overlooking the fact that JSON documents with varying schemas are better served by a NoSQL document database like Cosmos DB.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it requires a fixed relational schema, making it cumbersome to store products with varying attributes per category without extensive use of EAV (Entity-Attribute-Value) patterns or JSON columns, which negate the benefits of a relational store. Option C (Azure Blob Storage) is wrong because it is designed for unstructured binary or text data, not for querying individual JSON documents by fields like category and price range; it lacks native indexing and query capabilities for document-level attributes. Option D (Azure Table Storage) is wrong because it is a key-value store that does not support native JSON document storage or querying by nested attributes; it requires flat schemas and cannot efficiently handle queries on multiple properties like price range across different product categories.

Full explanation →

297

Multi-Selectmedium

Which TWO Azure services can be used to host a relational database that requires native support for JSON data and high availability with automatic failover?

Select 2 answers

A.Azure SQL Database

B.Azure Database for MariaDB

C.Azure SQL Managed Instance

D.Azure Cosmos DB

E.Azure Database for PostgreSQL

AnswersA, E

Azure SQL Database supports JSON functions and provides built-in high availability with automatic failover.

Why this answer

Azure SQL Database and Azure Database for PostgreSQL both support JSON and provide high availability with automatic failover. Azure Cosmos DB is NoSQL. Azure SQL Managed Instance supports JSON but is not serverless by default.

Azure Database for MariaDB does not have native JSON support.

Full explanation →

298

MCQmedium

You need to store log files from multiple virtual machines in a central location. Each log file is appended to continuously and can be up to 1 TB. The solution must support concurrent appends from many VMs and provide low-latency read access for real-time monitoring. Which Azure storage solution should you choose?

A.Azure Cosmos DB

B.Azure Blob Storage with append blobs

C.Azure Files

D.Azure NetApp Files

AnswerB

Append blobs are designed for logging scenarios with concurrent appends.

Why this answer

Option A is correct because Azure Blob Storage (with append blobs) is designed for append-only logs, supports concurrent appends, and provides low-latency read access. Option B is wrong because Cosmos DB is not optimized for large append-only logs. Option C is wrong because Azure Files is a file share, not optimized for concurrent appends from many clients.

Option D is wrong because Azure NetApp Files is expensive and overkill for log storage.

Full explanation →

299

MCQmedium

A company stores IoT sensor data in Azure Blob Storage. The data is written once, rarely accessed, and must be retained for 10 years for compliance. The cheapest storage tier should be used for the first 30 days after ingestion, then moved to a lower-cost tier. Which storage tier configuration should you recommend?

A.Use the Hot tier for the first 30 days, then move to the Cool tier.

B.Use the Cool tier for the first 30 days, then move to the Archive tier.

C.Use the Archive tier from the start.

D.Use the Hot tier for the first 30 days, then move to the Archive tier.

AnswerD

Hot is cost-effective for frequent access; Archive is cheapest for long-term retention.

Why this answer

Option A is correct because the Hot tier is optimal for frequent access during the first 30 days, and the Archive tier is the cheapest for long-term retention (10 years) with rare access. Option B is wrong because the Cool tier is more expensive than Archive for long-term retention. Option C is wrong because the Cool tier is not the cheapest for 10 years.

Option D is wrong because the Hot tier is more expensive than needed for the first 30 days.

Full explanation →

300

MCQmedium

A social networking application needs to store and query relationships between users, such as 'friends of friends'. The application should be able to traverse these relationships efficiently to recommend new connections. Which Azure NoSQL data store and API should they choose?

A.Azure Cosmos DB with the Gremlin API

B.Azure Cosmos DB with the Table API

C.Azure Cache for Redis

D.Azure Blob Storage

AnswerA

Correct. The Gremlin API is a graph database that efficiently stores and queries relationships between entities, ideal for recommendation engines based on friend connections.

Why this answer

Azure Cosmos DB with the Gremlin API is the correct choice because Gremlin is a graph traversal language specifically designed for querying highly connected data, such as social network relationships. It allows efficient traversal of edges (e.g., 'friends of friends') using graph algorithms, which is exactly what the application needs for recommending new connections. Other APIs like Table API or services like Blob Storage lack native graph traversal capabilities.

Exam trap

The trap here is that candidates often confuse the Table API (which is also NoSQL) as suitable for relationships, but it cannot perform multi-hop graph traversals, while Azure Cache for Redis might seem plausible due to its set operations, but it lacks a graph query language and persistence guarantees.

How to eliminate wrong answers

Option B is wrong because the Table API is a key-value store optimized for simple lookups by partition and row key, and it cannot perform graph traversals like 'friends of friends'. Option C is wrong because Azure Cache for Redis is an in-memory cache, not a persistent data store, and while it supports data structures like sets, it lacks a native graph query language for multi-hop traversals. Option D is wrong because Azure Blob Storage is an object store for unstructured binary data (e.g., images, videos) and has no query engine for relationship traversal.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 226–300