Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 151–225 | Page 3/14

151

MCQhard

You are a data architect at a global retail company. The company has an Azure Data Lake Storage Gen2 account that stores petabytes of clickstream data. They need to provide near real-time analytics dashboards for regional managers. The data arrives in batches every 5 minutes. Currently, they use Azure Databricks to transform the data and load it into Azure Synapse Analytics, but the dashboards show data that is 30 minutes old. The business requires dashboards to reflect data within 10 minutes of ingestion. You propose a new solution. Which approach should you recommend?

A.Keep current pipeline but replace Synapse with Azure Analysis Services for faster query performance.

B.Use Azure Data Factory with tumbling window triggers every 5 minutes to load data from Data Lake to Synapse.

C.Ingest data into Azure Event Hubs, use Azure Stream Analytics to process and output to Power BI for real-time dashboards.

D.Increase the number of Databricks clusters and use Auto Loader to speed up transformations.

AnswerC

Stream Analytics provides low-latency streaming to Power BI.

Why this answer

Option C is correct because it uses Azure Event Hubs for low-latency ingestion and Azure Stream Analytics for real-time processing, enabling near real-time dashboards in Power BI with sub-minute latency. This architecture bypasses the batch-oriented pipeline that causes the current 30-minute delay, meeting the 10-minute requirement.

Exam trap

The trap here is that candidates may assume batch tools like Data Factory or Databricks can be tuned to meet near real-time SLAs, but they fundamentally operate on file-based or micro-batch paradigms that cannot match the sub-minute latency of a true streaming pipeline with Event Hubs and Stream Analytics.

How to eliminate wrong answers

Option A is wrong because replacing Synapse with Azure Analysis Services does not address the root cause of latency—the batch processing in Databricks—and Analysis Services is an OLAP engine that still requires data to be loaded, not a streaming solution. Option B is wrong because Azure Data Factory with tumbling window triggers is a batch-oriented orchestration tool that introduces inherent latency from window scheduling and data movement, failing to achieve sub-10-minute freshness. Option D is wrong because increasing Databricks clusters and using Auto Loader only accelerates the batch transformation step but does not eliminate the fundamental batch processing delay, and Auto Loader still operates on file arrival, not streaming.

Full explanation →

152

MCQhard

Your organization stores sensitive financial data in Azure SQL Database. You need to audit all SELECT operations on the 'Transactions' table and alert the security team when a user outside the finance department queries the table. Which feature should you use?

A.Microsoft Defender for SQL

B.Dynamic Data Masking

C.SQL Server Auditing

D.Transparent Data Encryption

AnswerC

Auditing logs database events; can be configured to capture SELECT operations and trigger alerts.

Why this answer

SQL Server Auditing is the correct choice because it tracks database events, including SELECT operations, and writes them to an audit log. You can configure an audit policy to capture all SELECT statements on the 'Transactions' table and then set up an alert (e.g., via Azure Monitor or Logic Apps) that triggers when a user from outside the finance department executes such a query. This directly meets the requirement to both audit and alert on specific user actions.

Exam trap

The trap here is that candidates often confuse auditing (logging who did what) with security features that protect data at rest or in transit, such as TDE or Dynamic Data Masking, which do not provide any logging or alerting capabilities.

How to eliminate wrong answers

Option A is wrong because Microsoft Defender for SQL provides vulnerability assessments, threat detection, and anomaly alerts, but it does not offer granular auditing of specific table-level SELECT operations or user-based alerting. Option B is wrong because Dynamic Data Masking obfuscates sensitive data in query results to unauthorized users, but it does not log or alert on who performed the query. Option D is wrong because Transparent Data Encryption (TDE) encrypts the database at rest and on backup media, but it provides no auditing or alerting capabilities for data access operations.

Full explanation →

153

MCQmedium

A smart city application collects sensor data from thousands of devices. Data is ingested as JSON messages containing deviceId, timestamp, and reading value. The application must support fast point reads by deviceId and also run queries to retrieve all readings for a specific deviceId within a time range. The development team prefers a SQL-like query language. Which Azure Cosmos DB API should they choose?

A.SQL (Core) API

B.Table API

C.MongoDB API

D.Cassandra API

AnswerA

Correct. SQL API offers SQL-like query syntax, fast point reads, and works directly with JSON documents.

Why this answer

The SQL (Core) API is the best choice because it natively supports SQL-like querying, enabling both fast point reads by deviceId (using the partition key) and efficient time-range queries on a specific deviceId. It also provides native JSON support, which aligns with the JSON message format from the sensors, and allows indexing on timestamp for range queries.

Exam trap

Microsoft often tests the misconception that any API with a SQL-like name (like Cassandra's CQL) is equivalent to the SQL (Core) API, but the key differentiator is native JSON support and the specific query language syntax preferred by the team.

How to eliminate wrong answers

Option B (Table API) is wrong because it uses a key-value store with OData query syntax, not SQL-like queries, and lacks native support for complex range queries on timestamps within a partition. Option C (MongoDB API) is wrong because it uses MongoDB's query language (based on BSON and JSON-like documents), not SQL, and while it supports range queries, the team specifically prefers SQL-like syntax. Option D (Cassandra API) is wrong because it uses CQL (Cassandra Query Language), which is SQL-like but designed for wide-column stores and does not natively support efficient time-range queries on a single partition key without additional modeling (e.g., using clustering columns), and it lacks native JSON support for the ingested messages.

Full explanation →

154

MCQeasy

A small business wants to use Azure to store and analyze customer feedback from surveys. The surveys are collected via a web app and stored as JSON files. The business needs to run SQL-based queries on the data and generate reports in Power BI. They have a limited budget and prefer a serverless option to minimize management overhead. Which Azure service should they use?

A.Azure Analysis Services

B.Azure Databricks

C.Azure Synapse Analytics serverless SQL pool

D.Azure SQL Database

AnswerC

Serverless, pay-per-query, can query JSON directly.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it allows you to query JSON files directly from Azure Data Lake Storage or Blob Storage using standard T-SQL, without provisioning any infrastructure. It is serverless (pay-per-query), supports SQL-based queries, and integrates seamlessly with Power BI for reporting, making it ideal for a small business with a limited budget and minimal management overhead.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'fully managed' and choose Azure SQL Database (which is managed but not serverless in the pay-per-query sense) or Azure Databricks (which is serverless but requires Spark expertise and is not SQL-native), missing that Azure Synapse serverless SQL pool is the only option that combines serverless billing, direct JSON querying, and SQL-based reporting for Power BI.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a fully managed analytical engine that requires provisioning and managing a model, and it is not designed for direct querying of raw JSON files; it is used for building tabular or multidimensional models from pre-processed data. Option B is wrong because Azure Databricks is a big data analytics platform based on Apache Spark, which is overkill for simple SQL queries on JSON files and incurs cluster management costs even in serverless mode; it is not optimized for ad-hoc SQL queries on semi-structured data. Option D is wrong because Azure SQL Database is a fully managed relational database that requires provisioning a database instance and schema, and it is not serverless in the sense of pay-per-query; it incurs ongoing costs even when idle and requires importing JSON data into tables before querying.

Full explanation →

155

MCQeasy

Refer to the exhibit. This ARM template snippet is used to deploy which Azure resource?

A.Azure Database for MySQL server

B.Azure SQL Managed Instance

C.Azure SQL Database server

D.Azure Synapse Analytics workspace

AnswerC

The template defines a logical server for Azure SQL Database.

Why this answer

Option A is correct because the properties include serverName, administratorLogin, etc., which are for an Azure SQL Database server. Option B is wrong because Azure SQL Managed Instance deployment uses different properties. Option C is wrong because Azure Database for MySQL has different properties.

Option D is wrong because Azure Synapse Analytics workspace has different properties.

Full explanation →

156

MCQmedium

A SaaS company hosts a multi-tenant application. Each tenant has a separate Azure SQL Database. The databases are small (1-3 GB) and their workloads vary significantly over time, with some tenants active during business hours and others at night. The company wants to maximize resource utilization and minimize costs by pooling compute resources across tenants while maintaining predictable performance per database. Which Azure SQL Database deployment option should they choose?

A.Azure SQL Database Single Database

B.Azure SQL Database Elastic Pool

C.Azure SQL Managed Instance

D.SQL Server on Azure Virtual Machine

AnswerB

An elastic pool allows multiple databases to share a pool of resources. Databases automatically use resources as needed, maximizing utilization and lowering cost while providing predictable performance per database via settings like min and max vCores.

Why this answer

Azure SQL Database Elastic Pool is the correct choice because it allows multiple databases (tenants) to share a fixed pool of compute and storage resources, enabling cost efficiency through resource pooling while providing predictable performance via per-database resource limits (min/max DTU or vCore). This matches the scenario of small databases with variable, non-overlapping workloads across tenants.

Exam trap

The trap here is that candidates may choose Single Database (Option A) thinking it offers the best isolation, but they overlook the cost and resource utilization benefits of Elastic Pool for variable, non-overlapping workloads across many small databases.

How to eliminate wrong answers

Option A is wrong because Single Database allocates dedicated resources per database, which would be cost-inefficient for small, variable workloads and does not allow pooling compute across tenants. Option C is wrong because Azure SQL Managed Instance is designed for lift-and-shift migrations of entire SQL Server instances with high compatibility, not for pooling resources across many small databases, and it incurs higher base costs. Option D is wrong because SQL Server on Azure Virtual Machine requires manual management of compute resources and licensing, lacks built-in pooling for multi-tenant scenarios, and does not provide the same level of automated resource sharing and predictable per-database performance as Elastic Pool.

Full explanation →

157

MCQmedium

You need to design a real-time dashboard that displays the number of orders placed in the last hour from an e-commerce application. The application writes orders to Azure Event Hubs. Which Azure service should you use to aggregate the data and serve the dashboard with minimal latency?

A.Azure Databricks Structured Streaming

B.Azure Stream Analytics with Power BI output

C.Azure Analysis Services

D.Azure Data Factory with tumbling window

AnswerB

Stream Analytics processes streaming data in real-time and integrates directly with Power BI.

Why this answer

Azure Stream Analytics is purpose-built for real-time data processing from sources like Event Hubs, and its native integration with Power BI enables direct output to a dashboard with sub-second latency. This combination provides the minimal-latency aggregation and serving required for a real-time orders dashboard without additional infrastructure.

Exam trap

The trap here is that candidates may confuse real-time processing with batch-oriented services like Azure Data Factory or assume that any big data platform (like Databricks) is automatically the best choice for low-latency dashboards, overlooking the purpose-built streaming-to-visualization pipeline of Stream Analytics and Power BI.

How to eliminate wrong answers

Option A is wrong because Azure Databricks Structured Streaming, while capable of real-time processing, introduces additional overhead for cluster management and is not optimized for direct dashboard serving with minimal latency compared to Stream Analytics. Option C is wrong because Azure Analysis Services is an OLAP engine for historical data analysis and cannot process real-time streaming data from Event Hubs. Option D is wrong because Azure Data Factory with tumbling window is designed for batch processing on a schedule, not for real-time streaming aggregation and low-latency dashboard updates.

Full explanation →

158

MCQmedium

The exhibit shows a SQL query run against Azure SQL Database. What is the purpose of the HAVING clause in this query?

A.To filter rows before grouping

B.To sort the result set

C.To join two tables

D.To filter groups based on aggregate conditions

AnswerD

HAVING filters groups after GROUP BY using aggregate functions.

Why this answer

The HAVING clause in SQL is used to filter groups after the GROUP BY clause has been applied, based on aggregate conditions such as SUM, COUNT, or AVG. In this query against Azure SQL Database, HAVING restricts the result to only those groups that satisfy the specified aggregate condition, which cannot be done with a WHERE clause because WHERE filters individual rows before grouping.

Exam trap

The trap here is that candidates often confuse HAVING with WHERE, mistakenly thinking HAVING can filter individual rows before grouping, when in fact WHERE must be used for that purpose.

How to eliminate wrong answers

Option A is wrong because the WHERE clause, not HAVING, is used to filter rows before grouping; HAVING operates after grouping. Option B is wrong because sorting is performed by the ORDER BY clause, not HAVING. Option C is wrong because joining tables is done with JOIN (e.g., INNER JOIN, LEFT JOIN) in the FROM clause, not with HAVING.

Full explanation →

159

Drag & Dropmedium

Drag and drop the steps to create an Azure Stream Analytics job in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating a Stream Analytics job involves setting up the job parameters, then configuring inputs, outputs, and the transformation query.

Full explanation →

160

MCQmedium

A data engineering team is designing a data warehouse in Azure Synapse Analytics. They need to load data from an on-premises SQL Server database daily. The data volume is about 500 GB per load. They want to minimize data transfer costs and use the fastest possible transfer method. Which approach should they use?

A.Use the Azure portal to upload a CSV file

B.Use SQL Server Integration Services (SSIS) to load to Azure Blob and then into Synapse

C.Use Azure Data Factory with PolyBase

D.Export to BACPAC and import to Azure SQL Database

AnswerC

Data Factory with PolyBase provides fast parallel loading.

Why this answer

Option B is correct because Azure Data Factory with PolyBase allows high-speed parallel loading into Azure Synapse. Option A is wrong because using the Azure portal upload is not practical for 500 GB. Option C is wrong because BACPAC export/import is not optimized for large volumes.

Option D is wrong because SSIS with Azure Blob is slower and more complex.

Full explanation →

161

MCQeasy

A company stores weather sensor data in Azure Data Lake Storage Gen2. Data scientists need to run large-scale transformations and machine learning experiments on this data using Python and Apache Spark. They want to collaborate using shared Jupyter notebooks. Which Azure service should they use for this analytical workload?

A.Azure Stream Analytics

B.Azure Synapse Analytics (with Apache Spark pools)

C.Azure Analysis Services

D.Azure SQL Database

AnswerB

Correct. Azure Synapse Analytics provides Apache Spark pools integrated with Jupyter notebooks, enabling data scientists to run Python/Spark jobs on data stored in ADLS Gen2 for transformations and ML.

Why this answer

Azure Synapse Analytics with Apache Spark pools provides a fully managed Spark environment that supports Python and allows data scientists to run large-scale transformations and machine learning experiments. It integrates directly with Azure Data Lake Storage Gen2 for reading and writing data, and supports collaborative Jupyter notebooks for shared development. This makes it the correct choice for the described analytical workload.

Exam trap

The trap here is that candidates may confuse Azure Stream Analytics (a real-time processing service) with batch processing, or think Azure SQL Database can handle large-scale Spark workloads, when in fact only Synapse Analytics with Spark pools provides the required Python, Spark, and collaborative notebook capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service for analyzing data in motion, not designed for large-scale batch transformations or collaborative Jupyter notebook-based machine learning experiments. Option C is wrong because Azure Analysis Services is an enterprise-grade semantic model engine for creating tabular models and performing business analytics, not a platform for running Python or Apache Spark code. Option D is wrong because Azure SQL Database is a relational database service for transactional workloads and does not natively support Apache Spark or Jupyter notebooks for large-scale data transformations and ML experiments.

Full explanation →

162

Multi-Selecteasy

Which TWO security features are available in Azure SQL Database to help protect data at rest?

Select 2 answers

A.Firewall rules

B.Dynamic data masking

C.Transparent data encryption (TDE)

D.Azure AD authentication

E.Always Encrypted

AnswersC, E

TDE encrypts the database files at rest.

Why this answer

Transparent Data Encryption (TDE) is a feature in Azure SQL Database that encrypts data at rest, including backups and transaction log files, using an AES-256 encryption algorithm. It performs real-time I/O encryption and decryption of the data without requiring changes to the application, ensuring that the physical storage media is protected against unauthorized access. Always Encrypted also protects data at rest by encrypting sensitive columns within the database, but it additionally protects data in transit and during query processing by keeping the encryption keys on the client side.

Exam trap

The trap here is that candidates often confuse dynamic data masking with encryption, or assume that authentication or network controls (like firewall rules) provide data-at-rest protection, when in fact only encryption mechanisms like TDE and Always Encrypted directly secure data stored on disk.

Full explanation →

163

MCQhard

A company uses Azure SQL Database for a mission-critical application. They need to ensure that in the event of a regional outage, the database can be failed over to a secondary region with minimal data loss. The recovery point objective (RPO) is 5 seconds. Which deployment option should they choose?

A.Azure SQL Database with failover group enabled

B.Azure SQL Database with active geo-replication and a secondary in a paired region

C.Azure SQL Database single instance with locally redundant storage

D.Azure SQL Database with auto-failover group

AnswerB

Active geo-replication supports synchronous mode allowing RPO of 5 seconds.

Why this answer

Option C is correct because Azure SQL Database active geo-replication with a secondary in a paired region provides a configurable RPO of 5 seconds using synchronous mode. Option A is wrong because failover groups use asynchronous replication by default, RPO is higher. Option B is wrong because auto-failover groups are similar.

Option D is wrong because a single database in the same region cannot protect against a regional outage.

Full explanation →

164

MCQmedium

A multinational e-commerce company uses Azure SQL Database for its order processing system. They need to ensure that if an entire Azure region becomes unavailable, the database remains available with minimal data loss and automatic failover. Which feature should they implement?

A.Active geo-replication

B.Automatic tuning

C.Elastic pools

D.Serverless compute

AnswerA

Active geo-replication replicates data to a secondary region and can be configured with failover groups for automatic failover.

Why this answer

Active geo-replication (Option A) is correct because it creates readable secondary replicas of an Azure SQL Database in a paired Azure region, enabling automatic failover during a regional outage. This feature provides a recovery point objective (RPO) of as low as 5 seconds and a recovery time objective (RTO) of under 1 hour, ensuring minimal data loss and high availability.

Exam trap

The trap here is that candidates may confuse 'automatic tuning' (a performance feature) with 'automatic failover' (a disaster recovery feature), or assume that serverless compute or elastic pools inherently provide high availability, which they do not.

How to eliminate wrong answers

Option B (Automatic tuning) is wrong because it optimizes query performance through index management and plan regression fixes, not disaster recovery or regional failover. Option C (Elastic pools) is wrong because they are a cost-management model for sharing resources among multiple databases, not a high-availability or geo-replication feature. Option D (Serverless compute) is wrong because it auto-scales compute resources based on workload demand and pauses idle databases, but it does not provide any cross-region replication or automatic failover capability.

Full explanation →

165

MCQmedium

A mobile gaming company stores player profiles in Azure Cosmos DB. Each profile document contains many optional fields, and queries frequently filter by the player's locale (a field present in about 30% of documents). Which approach will optimize query performance for these filters?

A.Embed all fields in a single document and rely on automatic indexing

B.Normalize the data by storing locale in a separate container and use cross-container queries

C.Define a fixed schema for all documents to ensure every document has the locale field

D.Create a composite index that includes the locale field

AnswerD

A composite index on the locale field (and optionally including other fields) allows the query engine to quickly locate matching documents, reducing RU consumption and improving response time.

Why this answer

Option D is correct because creating a composite index that includes the locale field allows Azure Cosmos DB to efficiently filter queries by locale without scanning every document. Since locale is present in only 30% of documents, a composite index reduces the query RU cost by directly locating matching documents, leveraging the index's sorted structure for faster lookups.

Exam trap

The trap here is that candidates assume automatic indexing is sufficient for all queries, but they overlook that sparsely populated fields benefit from explicit composite indexing to avoid high RU costs from index scans.

How to eliminate wrong answers

Option A is wrong because embedding all fields in a single document with automatic indexing does not optimize queries for a sparsely present field like locale; automatic indexing still requires a full index scan for the field, leading to higher RU consumption. Option B is wrong because normalizing locale into a separate container and using cross-container queries introduces additional latency and RU cost due to cross-partition queries, and Cosmos DB does not support efficient cross-container joins. Option C is wrong because defining a fixed schema to force the locale field on all documents increases storage and write RU costs unnecessarily, and does not improve query performance without an appropriate index on the field.

Full explanation →

166

MCQeasy

A logistics company stores shipment tracking data. The shipment ID, destination, and weight are stored in a fixed-schema database table. The shipment's route history is stored as a JSON document where each document can have different fields depending on the route events recorded. Which classification of data best describes the route history data?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Analytical data

AnswerB

JSON documents with varying fields are a classic example of semi-structured data. They have a schema that can evolve and are self-describing.

Why this answer

The route history data is stored as JSON documents where each document can have different fields depending on the events recorded. This flexibility in schema—where fields vary per document—is the hallmark of semi-structured data. Unlike structured data with a fixed schema, semi-structured data uses tags or markers (like JSON key-value pairs) to organize the data, making it self-describing.

Exam trap

The trap here is that candidates confuse 'structured' with 'organized' and assume JSON is structured because it has keys, but the key differentiator is schema flexibility—structured data enforces a fixed schema, while semi-structured data allows varying fields per record.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with predefined columns and data types, such as the shipment ID, destination, and weight table mentioned in the question. Option C is wrong because unstructured data has no inherent structure or organization, such as raw video files or free-form text, whereas JSON documents have a defined key-value structure. Option D is wrong because analytical data is a classification of data usage (e.g., for reporting or BI), not a classification of data structure; the question asks about the structural classification of the route history data.

Full explanation →

167

MCQeasy

You need to choose a non-relational data store for a key-value workload where each item is identified by a partition key and row key. Which Azure service should you use?

A.Azure Blob Storage

B.Azure Files

C.Azure Cosmos DB

D.Azure Table Storage

AnswerD

Table Storage is designed for key-value with partition and row keys.

Why this answer

Azure Table Storage is a key-value store that uses partition key and row key. Option A is wrong because Cosmos DB can also do key-value but Table Storage is simpler. Option C is wrong because Blob Storage is for blobs.

Option D is wrong because Files is for file shares.

Full explanation →

168

MCQeasy

A company uses Azure Cosmos DB for a globally distributed application. They need to ensure low-latency reads and writes for users in multiple regions. Which consistency level provides the strongest guarantees without sacrificing availability?

A.Bounded staleness

B.Consistent prefix

C.Strong

D.Eventual

AnswerA

Bounded staleness offers strong consistency with a configurable lag and maintains write availability.

Why this answer

Bounded staleness provides the strongest consistency guarantee that still maintains availability during a partition. It ensures that reads are guaranteed to be within a configurable staleness window (either K versions or a time interval) from the latest write, which is stronger than consistent prefix or eventual consistency, while avoiding the availability trade-offs of strong consistency in a globally distributed Azure Cosmos DB account.

Exam trap

The trap here is that candidates often confuse 'strongest guarantees' with 'strong consistency,' not realizing that strong consistency sacrifices availability during a partition, whereas bounded staleness is the strongest level that still guarantees high availability in a globally distributed setup.

How to eliminate wrong answers

Option B (Consistent prefix) is wrong because it guarantees only that reads never see out-of-order writes, but it does not bound how far behind a read can be, so it is weaker than bounded staleness. Option C (Strong) is wrong because it offers linearizability but sacrifices write availability during a regional outage or partition, as all replicas must acknowledge the write before it is committed. Option D (Eventual) is wrong because it offers no ordering or recency guarantees; reads may return stale data indefinitely, which is the weakest consistency level.

Full explanation →

169

MCQhard

A company uses Azure Cosmos DB for a global e-commerce application. The application needs to support multi-region writes and provide strong consistency for inventory updates. Which configuration minimizes write latency while meeting the consistency requirement?

A.Multi-master with bounded staleness consistency

B.Single-region writes with session consistency

C.Single-region writes with strong consistency

D.Multi-master with eventual consistency

AnswerC

Strong consistency is supported only with single-region writes; multi-region writes cannot achieve strong consistency.

Why this answer

Multi-region writes with strong consistency is not supported in Cosmos DB. To minimize write latency with strong consistency, use single-region writes with strong consistency. Option A (multi-master with bounded staleness) does not provide strong consistency; Option C (multi-master with eventual consistency) is weaker; Option D (single-region with session consistency) is not strong.

Full explanation →

170

Multi-Selecteasy

Which TWO of the following Azure services are considered non-relational data stores?

Select 2 answers

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure Synapse Analytics

E.Azure Database for PostgreSQL

AnswersB, C

Azure Cosmos DB is a NoSQL database.

Why this answer

Options A and C are correct. Azure Cosmos DB is a NoSQL database (A), and Azure Table Storage is a key-value store (C). Option B is wrong because Azure SQL Database is relational.

Option D is wrong because Azure Database for PostgreSQL is relational. Option E is wrong because Azure Synapse Analytics is a relational analytics system.

Full explanation →

171

MCQmedium

A media publishing company stores high-resolution images and video files for their website. These files are large (hundreds of MBs each) and are accessed only a few times per month, but when accessed, they must be delivered within seconds. Additionally, they need to store a small amount of metadata (e.g., upload date, author) for each file. Which Azure service should they use for storing the binary files?

A.Azure Table Storage

B.Azure Blob Storage

C.Azure File Storage

D.Azure Queue Storage

AnswerB

Azure Blob Storage is optimized for large unstructured binary objects and supports custom metadata.

Why this answer

Azure Blob Storage is designed for storing massive amounts of unstructured binary data, such as high-resolution images and video files. It supports objects up to 4.75 TB in size, offers tiered storage (including cool and archive tiers) to optimize cost for infrequently accessed data, and provides low-latency access (typically under 10 seconds) for retrieval when needed. This makes it the ideal choice for the media publishing company's requirements.

Exam trap

The trap here is that candidates confuse Azure Table Storage (for metadata) with the primary storage for binary files, or they assume Azure File Storage (SMB shares) is suitable for web-serving large media files, when in fact Blob Storage is the correct service for unstructured binary data with infrequent access patterns.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store for structured, non-relational data (e.g., metadata), not for large binary files like images or videos. Option C is wrong because Azure File Storage provides SMB file shares for shared file access across VMs or on-premises, but it is not optimized for storing and serving large, infrequently accessed binary files with blob-tier cost savings. Option D is wrong because Azure Queue Storage is a messaging service for asynchronous communication between application components, not a storage solution for binary files.

Full explanation →

172

MCQmedium

A financial services company needs to store transaction records for 7 years to comply with regulatory requirements. The data is rarely accessed after the first month but must be available for audit within 24 hours. The storage solution must minimize cost while meeting compliance. Which Azure storage tier should you use for data older than one month?

A.Cool tier

B.Premium tier

C.Archive tier

D.Hot tier

AnswerA

Low storage cost and retrieval within minutes, fitting the 24-hour requirement.

Why this answer

The Cool tier is designed for data that is infrequently accessed but must be available quickly when needed, with a 30-day minimum storage duration and lower storage cost than Hot tier. Since the data is rarely accessed after the first month but must be retrievable within 24 hours for audits, Cool tier meets both the cost and availability requirements without the higher cost of Hot tier or the retrieval delay of Archive tier.

Exam trap

The trap here is that candidates often confuse 'rarely accessed' with 'archival' and choose Archive tier, overlooking the specific 24-hour retrieval requirement and the 180-day minimum storage duration that would cause early deletion charges for a 7-year retention policy.

How to eliminate wrong answers

Option B (Premium tier) is wrong because it is optimized for low-latency, high-throughput workloads (e.g., Azure VM disks or high-performance databases) and is significantly more expensive than needed for rarely accessed audit data. Option C (Archive tier) is wrong because it has a retrieval time of up to 15 hours (and often longer) and a 180-day minimum storage duration, which violates the 24-hour availability requirement and the 7-year retention period without incurring early deletion fees. Option D (Hot tier) is wrong because it is designed for frequently accessed data with the highest storage cost, making it cost-inefficient for data that is rarely accessed after the first month.

Full explanation →

173

MCQhard

A software-as-a-service (SaaS) provider hosts a multi-tenant application with a separate database for each tenant. They anticipate scaling to thousands of tenants and want to minimize cost while allowing tenants to share resources flexibly. Which Azure SQL offering is most suitable?

A.Azure SQL Database elastic pool

B.Azure SQL Database (single database)

C.Azure SQL Managed Instance

D.SQL Server on Azure Virtual Machine

AnswerA

Elastic pools provide a cost-effective way to manage and scale multiple databases with fluctuating resource needs, ideal for multi-tenant SaaS scenarios.

Why this answer

Azure SQL Database elastic pool is the most suitable choice because it allows multiple single databases (one per tenant) to share a fixed set of resources (DTUs or vCores) within a pool, optimizing cost by averaging resource usage across tenants. This model supports scaling to thousands of tenants while providing resource elasticity and predictable pricing, as tenants with low activity can use unused capacity from busy ones without over-provisioning.

Exam trap

The trap here is that candidates often choose Azure SQL Database single database (Option B) because they assume 'separate database for each tenant' implies isolation, but they overlook the cost and scaling inefficiency of dedicating resources per tenant when resource sharing is explicitly desired.

How to eliminate wrong answers

Option B (Azure SQL Database single database) is wrong because it requires each tenant database to have its own dedicated resources, leading to higher costs and resource fragmentation when scaling to thousands of tenants. Option C (Azure SQL Managed Instance) is wrong because it is designed for lift-and-shift migrations of on-premises SQL Server workloads with full instance-level features, not for cost-efficient multi-tenant resource sharing at scale. Option D (SQL Server on Azure Virtual Machine) is wrong because it involves managing the underlying OS and SQL Server license, increasing operational overhead and cost, and lacks the built-in resource pooling and elastic scaling of PaaS offerings.

Full explanation →

174

MCQeasy

A data analyst needs to create interactive dashboards and reports from data stored in Azure Synapse Analytics. Which tool should they use?

A.Microsoft Power BI

B.SQL Server Reporting Services (SSRS)

C.Microsoft Excel

D.Azure Data Studio

AnswerA

Power BI provides interactive dashboards and reports with native Synapse connectivity.

Why this answer

Microsoft Power BI is the correct tool because it is designed specifically for creating interactive dashboards and reports from a wide range of data sources, including Azure Synapse Analytics. Power BI connects directly to Synapse SQL pools or serverless SQL endpoints using DirectQuery or import mode, enabling real-time visualizations and cross-filtering. This aligns with the requirement for interactive analytics, which is Power BI's core strength.

Exam trap

The trap here is confusing a data query/management tool (Azure Data Studio) or a static reporting tool (SSRS) with a dedicated interactive visualization platform, leading candidates to overlook Power BI's native integration with Azure Synapse Analytics.

How to eliminate wrong answers

Option B (SQL Server Reporting Services) is wrong because SSRS is a paginated report server for static, print-ready reports, not for interactive dashboards with live cross-filtering. Option C (Microsoft Excel) is wrong because while Excel can connect to Synapse and create charts, it lacks native interactive dashboard capabilities like slicers and drill-through across multiple visuals without complex add-ins. Option D (Azure Data Studio) is wrong because it is a database management and query tool for writing T-SQL and notebooks, not a reporting or dashboarding platform.

Full explanation →

175

MCQhard

A multinational corporation runs a mission-critical relational database on Azure SQL Database. They require automatic failover to a secondary region in case of a regional outage, with no data loss. The secondary region must also be readable for reporting purposes. What should they implement?

A.Active Geo-Replication with manual failover

B.Read Scale-Out with a local secondary replica

C.Azure Site Recovery for the database server

D.Failover group with a readable secondary in a different region

AnswerD

Failover groups provide automatic failover; readable secondaries allow reporting.

Why this answer

Option B is correct because a failover group with a readable secondary (via Active Geo-Replication) provides automatic failover and read access. Option A (Active Geo-Replication alone) requires manual failover. Option C (Read Scale-Out) does not provide geo-replication.

Option D (Azure Site Recovery) is for VM replication, not database.

Full explanation →

176

MCQeasy

A mobile game developer needs to store player session data. Each session has a unique SessionID, a UserID, a start timestamp, an end timestamp, and a collection of game events (each event is a JSON object). The application requires low-latency point reads by SessionID and the ability to query all sessions for a given UserID within a time range. The schema of game events can vary between sessions (e.g., new event types added frequently). The developer wants a fully managed NoSQL database that supports flexible schemas and secondary indexing. Which Azure data store should they choose?

A.Azure Cosmos DB with the NoSQL API

B.Azure Table Storage

C.Azure Blob Storage

D.Azure Cache for Redis

AnswerA

Cosmos DB provides document storage with flexible schema, indexing, and low-latency reads. It supports secondary indexes on arbitrary fields like UserID and timestamp.

Why this answer

Azure Cosmos DB with the NoSQL API is the correct choice because it provides a fully managed, globally distributed NoSQL database with native support for flexible schemas (schemaless JSON documents), low-latency point reads by partition key (SessionID), and automatic secondary indexing for querying by UserID within a time range. Its ability to handle varying game event schemas without schema migrations makes it ideal for this use case.

Exam trap

The trap here is that candidates often confuse Azure Table Storage with a fully queryable NoSQL database, but it lacks secondary indexing and complex query support, making it unsuitable for time-range queries on non-key fields.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because it is a key-value store with limited querying capabilities (only on PartitionKey and RowKey) and does not support secondary indexing or complex queries like time-range filtering on non-key attributes. Option C (Azure Blob Storage) is wrong because it is an object storage service designed for unstructured data (blobs), not for low-latency point reads or indexed queries on individual records; it lacks native querying and indexing for session data. Option D (Azure Cache for Redis) is wrong because it is an in-memory cache, not a durable database; it does not provide persistent storage, secondary indexing, or the ability to query by UserID across sessions over time.

Full explanation →

177

MCQmedium

Your organization is migrating its on-premises SQL Server databases to Azure. The databases include a mix of operational (OLTP) and analytical (OLAP) workloads. For the OLTP databases, you need high availability and automated failover to a secondary region. For the OLAP databases, you need to support large-scale analytic queries with columnstore indexes and the ability to pause compute to save costs. Which Azure SQL deployment options should you choose for each workload type?

A.Azure SQL Managed Instance for both

B.SQL Server on Azure Virtual Machines for both

C.Azure SQL Database Hyperscale for OLTP; Azure SQL Database Serverless for OLAP

D.Azure SQL Database (geo-replication) for OLTP; Azure Synapse Analytics (dedicated SQL pool) for OLAP

AnswerD

Geo-replication provides failover; Synapse supports columnstore and pause.

Why this answer

Option D is correct because Azure SQL Database with active geo-replication provides high availability and automated failover to a secondary region for OLTP workloads, while Azure Synapse Analytics (dedicated SQL pool) supports large-scale analytic queries with columnstore indexes and allows pausing compute to save costs, meeting the OLAP requirements.

Exam trap

The trap here is that candidates may confuse Azure SQL Database Serverless with Synapse Analytics for OLAP, overlooking that Serverless is for intermittent OLTP workloads, not large-scale analytics, and that Hyperscale is for high-scale OLTP, not geo-replication failover.

How to eliminate wrong answers

Option A is wrong because Azure SQL Managed Instance does not support automated failover to a secondary region (it only offers failover within the same region via failover groups) and lacks the ability to pause compute for cost savings. Option B is wrong because SQL Server on Azure Virtual Machines requires manual configuration for geo-replication and automated failover, and does not natively support pausing compute; it also incurs ongoing VM costs even when idle. Option C is wrong because Azure SQL Database Hyperscale is designed for large databases and high throughput, not specifically for OLTP with geo-replication failover, and Azure SQL Database Serverless supports auto-pausing but is not optimized for large-scale analytic queries with columnstore indexes like Synapse is.

Full explanation →

178

MCQmedium

A company has a database that processes millions of small credit card transactions per second for payment authorization. They also need to run complex reports that aggregate transaction data over months to detect fraud patterns. Which type of workload describes the payment authorization process?

A.OLTP (Online Transaction Processing)

B.OLAP (Online Analytical Processing)

C.HTAP (Hybrid Transactional/Analytical Processing)

D.ETL (Extract, Transform, Load)

AnswerA

OLTP systems handle high volumes of small, fast transactions, such as credit card authorization.

Why this answer

The payment authorization process involves high-volume, low-latency transactions that read, insert, and update individual records in real time. This is the classic definition of OLTP (Online Transaction Processing), which is optimized for ACID-compliant, row-based operations on current data. The scenario explicitly states 'millions of small credit card transactions per second,' which aligns with OLTP workloads like order entry or banking.

Exam trap

The trap here is that candidates see 'complex reports' and 'aggregate transaction data' in the same question and assume the entire workload is analytical, but the question explicitly asks only about the payment authorization process, which is purely transactional.

How to eliminate wrong answers

Option B (OLAP) is wrong because OLAP is designed for complex aggregations and historical analysis over large datasets, not for processing individual real-time transactions. Option C (HTAP) is wrong because HTAP combines OLTP and OLAP in a single system, but the question asks specifically about the payment authorization process, which is purely transactional, not analytical. Option D (ETL) is wrong because ETL is a data integration process used to move and transform data between systems, not a workload type for processing live transactions.

Full explanation →

179

MCQmedium

A company maintains a large 'Transactions' table in Azure SQL Database. The table has a clustered index on a GUID column (TransactionID). Over time, they observe slow insert performance due to index fragmentation and page splits. They also need fast point lookups by TransactionID. Which approach should they take to improve insert performance while still supporting fast lookups?

A.Change the clustered index to a nonclustered index on TransactionID and make the table a heap

B.Change the clustered key to an integer IDENTITY column and keep a nonclustered index on TransactionID

C.Keep the clustered index on TransactionID but rebuild it daily

D.Remove the clustered index entirely and create a nonclustered index on TransactionID

AnswerB

An integer IDENTITY column provides sequential values that reduce fragmentation and page splits, improving insert performance. The nonclustered index on TransactionID supports efficient point lookups. This is a recommended pattern when the natural key is not ideal for clustering.

Why this answer

Option B is correct because using an integer IDENTITY column as the clustered key eliminates the random insertion order and page splits caused by a GUID clustered index, while the nonclustered index on TransactionID provides fast point lookups. In Azure SQL Database, a clustered index determines the physical order of data; a monotonically increasing integer avoids fragmentation and improves insert throughput.

Exam trap

The trap here is that candidates assume rebuilding the clustered index (Option C) is a sufficient maintenance fix, but the DP-900 exam tests understanding that the root cause is the choice of clustered key data type, not just fragmentation management.

How to eliminate wrong answers

Option A is wrong because making the table a heap (no clustered index) removes the physical ordering that supports efficient range scans and can lead to forwarding pointers and slower lookups, and a nonclustered index on a heap still requires a bookmark lookup for each row. Option C is wrong because daily rebuilds only temporarily reduce fragmentation but do not address the root cause—GUIDs cause random inserts and page splits regardless of rebuild frequency, and rebuilds consume significant I/O and resources. Option D is wrong because removing the clustered index entirely and using only a nonclustered index on TransactionID results in a heap, which suffers from the same fragmentation issues and lacks the physical ordering that speeds up point lookups; the nonclustered index would also need to include a row locator (RID) that can change with heap maintenance.

Full explanation →

180

Multi-Selecthard

Which THREE of the following are valid considerations when choosing between Azure SQL Database and Azure SQL Managed Instance?

Select 3 answers

A.Azure SQL Managed Instance supports SQL Server Agent for job scheduling.

B.Azure SQL Database supports larger database sizes than Azure SQL Managed Instance.

C.Azure SQL Managed Instance supports cross-database queries within the same instance.

D.Azure SQL Managed Instance does not support virtual network integration.

E.Azure SQL Database supports elastic pools for cost-effective resource sharing among multiple databases.

AnswersA, C, E

SQL Agent is available in Managed Instance.

Why this answer

Options A, C, and E are correct. Azure SQL Database supports elastic pools, while Managed Instance supports instance-level features like SQL Agent and cross-database queries. Option B is incorrect because both support VNet integration, but Managed Instance is always placed in a VNet.

Option D is incorrect because Managed Instance supports larger databases.

Full explanation →

181

MCQhard

Refer to the exhibit. An administrator is configuring aggregations in Power BI Premium to improve performance on a large dataset. The aggregation is defined on the Sales table with SUM(Amount) grouped by ProductCategory, Region, and Date at the monthly level. However, some reports that query daily data are still slow. What is the most likely reason?

A.The dataset is too large for aggregations

B.The aggregation is not in DirectQuery mode

C.The aggregation level is monthly, but queries need daily granularity

D.The aggregation has too many dimensions

AnswerC

Daily queries cannot use the monthly aggregation, so they hit the full dataset.

Why this answer

Option C is correct because the aggregation is defined at the monthly level (grouping by month), but the slow reports are querying daily data. Power BI aggregations work by pre-aggregating data at a specified granularity; when a query requests a lower level of detail (daily), the aggregation cannot satisfy the query, so Power BI falls back to scanning the full detailed dataset, causing slow performance. To improve daily queries, an additional aggregation at the daily level would be needed.

Exam trap

The trap here is that candidates may think aggregations automatically speed up all queries on a table, but they must match the exact granularity of the query; otherwise, the aggregation is ignored and the full dataset is scanned.

How to eliminate wrong answers

Option A is wrong because the dataset being large is the reason aggregations are used, and the issue is not the size itself but the mismatch between the aggregation granularity and the query granularity. Option B is wrong because the mode (DirectQuery vs. Import) is not relevant to the granularity mismatch; aggregations can be defined in either mode, and the problem is the monthly grouping, not the storage mode.

Option D is wrong because having multiple dimensions (ProductCategory, Region, Date) is typical and not a cause of slowness; the issue is the date granularity, not the number of dimensions.

Full explanation →

182

MCQmedium

A media company stores video files in Azure Blob Storage. They want to use Azure Content Delivery Network (CDN) to serve these videos globally. However, they need to restrict access to only authorized users. What should you implement?

A.Set the blob container to public access.

B.Use Azure Active Directory (Microsoft Entra ID) authentication for the CDN endpoint.

C.Implement shared access signatures (SAS) and token-based authentication on the CDN.

D.Use a firewall on the storage account to allow only CDN IP addresses.

AnswerC

SAS tokens or CDN token authentication can restrict access to authorized users.

Why this answer

Azure CDN with token authentication (using shared access signatures or custom tokens) can restrict access to authorized users. Option A is wrong because public access would allow anyone. Option B is wrong because network restrictions don't work well for global users.

Option C is wrong because Azure AD authentication is not directly supported by CDN for blob access; token auth is the standard approach.

Full explanation →

183

MCQmedium

A gaming company stores player profiles in Azure Cosmos DB using the NoSQL API. Each profile is a JSON document containing fields like playerId, userName, level, inventory (an array of items), and friends (an array of playerIds). The application frequently needs to query all players that have a specific item in their inventory (e.g., 'sword'). Which Cosmos DB feature should they use to support this query efficiently?

A.Change feed

B.Stored procedures

C.Composite index

D.Indexing policy with a wildcard index

AnswerD

A wildcard index ensures all paths (including arrays) are indexed, allowing efficient use of ARRAY_CONTAINS queries.

Why this answer

Option D is correct because a wildcard index in the indexing policy allows Azure Cosmos DB to automatically index all properties within a JSON document, including nested array elements like those in the 'inventory' array. This enables efficient queries such as 'SELECT * FROM c WHERE ARRAY_CONTAINS(c.inventory, {name: "sword"})' without requiring a custom composite index for each possible item. Without a wildcard index, the query would require a full scan of all documents, which is inefficient at scale.

Exam trap

The trap here is that candidates often confuse indexing features, thinking a composite index (Option C) is needed for array queries, when in fact composite indexes are for multi-property equality or range filters, not for array membership queries which require a wildcard index to index the array elements themselves.

How to eliminate wrong answers

Option A is wrong because the change feed is a mechanism for capturing document inserts, updates, and deletes in chronological order, not for querying current data based on array contents. Option B is wrong because stored procedures are server-side JavaScript logic for transactional operations, not a query optimization feature for indexing array elements. Option C is wrong because a composite index is designed to optimize queries with multiple filter conditions (e.g., WHERE level = 10 AND userName = 'Alice'), not for queries that filter on array membership like 'inventory contains item X'.

Full explanation →

184

MCQmedium

A social networking application stores user profiles as JSON documents in Azure Cosmos DB. Each profile includes fields such as 'userName', 'email', 'followersCount', and optional 'interests'. The application needs to perform fast point reads by 'userName' (under 10 ms) and also run queries to find all users with a 'followersCount' greater than a certain value. The development team prefers to use a query syntax similar to SQL. Which Azure Cosmos DB API should they choose?

A.Azure Cosmos DB for NoSQL (SQL API)

B.Azure Cosmos DB for MongoDB

C.Azure Cosmos DB for Table

D.Azure Cosmos DB for Apache Cassandra

AnswerA

Correct. The SQL API provides a SQL-like query interface for JSON documents, supporting point reads by partition key and flexible queries on any field.

Why this answer

Azure Cosmos DB for NoSQL (SQL API) is the correct choice because it natively supports SQL-like query syntax for querying JSON documents, enabling the required queries such as filtering by 'followersCount'. It also provides fast point reads (under 10 ms) by using the 'userName' field as the partition key, ensuring efficient direct access to individual documents.

Exam trap

The trap here is that candidates may confuse the SQL-like syntax of Cassandra's CQL with the native SQL API, overlooking that Cassandra is a wide-column store not optimized for JSON document queries, while the SQL API is purpose-built for JSON documents and SQL queries.

How to eliminate wrong answers

Option B is wrong because Azure Cosmos DB for MongoDB uses MongoDB's query language (based on BSON and MongoDB operators), not SQL-like syntax, and would require the team to adapt to a different query paradigm. Option C is wrong because Azure Cosmos DB for Table is designed for key-value and tabular data with OData-based queries, not for querying nested JSON documents with SQL-like syntax. Option D is wrong because Azure Cosmos DB for Apache Cassandra uses CQL (Cassandra Query Language), which is similar to SQL but is optimized for wide-column stores and does not natively support querying JSON documents with the same flexibility as the SQL API.

Full explanation →

185

MCQeasy

A data analyst needs to visualize sales data from Azure SQL Database in real-time dashboards. Which tool should they use to create interactive reports?

A.Microsoft Power BI

B.Azure Data Studio

C.Azure Synapse Analytics

D.Microsoft Excel

AnswerA

Power BI is designed for interactive reporting and dashboards.

Why this answer

Microsoft Power BI is the correct tool because it is designed specifically for creating interactive, real-time dashboards and reports from various data sources, including Azure SQL Database. It supports live connections and DirectQuery to enable near-real-time visualization without requiring data movement.

Exam trap

The trap here is confusing database query tools (Azure Data Studio) or data storage/processing services (Azure Synapse Analytics) with dedicated visualization and reporting tools, leading candidates to overlook Power BI's specific role in real-time dashboard creation.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool for SQL Server and Azure SQL, not a reporting or dashboarding tool. Option C is wrong because Azure Synapse Analytics is an enterprise analytics service for large-scale data warehousing and big data processing, not a tool for building interactive reports. Option D is wrong because Microsoft Excel is a spreadsheet application that can connect to databases but lacks native real-time dashboard capabilities and is not designed for interactive, live reporting.

Full explanation →

186

Matchingmedium

Match each data type to its category in Azure.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Relational tables with fixed schema

JSON, XML, or key-value pairs

Blobs, files, and media

Data in tables with relationships

NoSQL data like documents or graphs

Why these pairings

Understanding data types helps choose the right Azure service.

Full explanation →

187

MCQmedium

A data analyst needs to run complex SQL queries against petabytes of historical sales data stored in Azure Data Lake Storage Gen2. The solution must be serverless with pay-per-query pricing. Which Azure service should they use?

A.Azure Synapse Analytics serverless SQL pool

B.Azure SQL Database

C.Azure HDInsight with Spark

D.Azure Analysis Services

AnswerA

It provides serverless SQL queries over data in Data Lake Storage with pay-per-query pricing.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it provides a serverless, pay-per-query engine that can directly query petabytes of data stored in Azure Data Lake Storage Gen2 using standard T-SQL. It eliminates infrastructure management and charges only for the data processed by each query, making it ideal for ad-hoc, complex SQL workloads on massive historical datasets.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics serverless SQL pool with Azure SQL Database or HDInsight, mistakenly thinking that any SQL-capable service can handle petabyte-scale serverless queries, while the key differentiator is the direct, pay-per-query access to Data Lake Storage without provisioning compute.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a fully managed, provisioned relational database service that requires pre-allocated resources and does not support serverless pay-per-query pricing for petabyte-scale data in Data Lake Storage; it is designed for transactional workloads, not analytical queries on external data. Option C is wrong because Azure HDInsight with Spark is a cluster-based service that requires provisioning and managing compute nodes, incurring costs even when idle, and does not offer true serverless pay-per-query pricing. Option D is wrong because Azure Analysis Services is a fully managed PaaS service for semantic models and in-memory analytics, requiring provisioned resources and not designed for direct serverless SQL queries against Data Lake Storage; it also lacks pay-per-query billing.

Full explanation →

188

MCQhard

A data warehouse team in Azure Synapse Analytics notices query performance degradation on a large fact table. The table is partitioned by date and has a clustered columnstore index. Which action is most likely to improve performance?

A.Update statistics on the fact table

B.Drop and recreate the partition boundaries

C.Reorganize the clustered columnstore index

D.Change the distribution to ROUND_ROBIN

AnswerC

Reorganizing compresses and merges columnstore segments, improving query performance.

Why this answer

Reorganizing the clustered columnstore index (option C) is the most likely action to improve performance because, over time, columnstore indexes can become fragmented due to data modifications (inserts, updates, deletes). Reorganizing the index physically recompresses the data into optimal rowgroups, removing deleted rows and merging small rowgroups, which directly improves query scan efficiency and reduces I/O.

Exam trap

The trap here is that candidates often confuse index maintenance (reorganize/rebuild) with statistics updates or distribution changes, mistakenly believing that stale statistics or a different distribution method are the primary causes of performance degradation on a large, partitioned fact table with a clustered columnstore index.

How to eliminate wrong answers

Option A is wrong because updating statistics helps the query optimizer generate better execution plans, but it does not address the underlying physical fragmentation of the columnstore index that causes degraded scan performance. Option B is wrong because dropping and recreating partition boundaries would change the partitioning scheme, which could disrupt data organization and query patterns; it does not fix fragmentation within the existing columnstore index. Option D is wrong because changing the distribution to ROUND_ROBIN would distribute data evenly across nodes but would eliminate the benefits of collocation for join and aggregation queries, likely worsening performance for typical data warehouse workloads.

Full explanation →

189

MCQmedium

A hospital uses Azure SQL Database to store patient records. The database contains tables for patient information, insurance details, and treatment plans. The system must ensure that if a transaction updates a patient's insurance and their treatment plan in two separate rows, either both updates succeed or both fail. Which ACID property guarantees this behavior?

A.Durability

B.Consistency

C.Isolation

D.Atomicity

AnswerD

Atomicity guarantees that all operations within a transaction are completed successfully or none are applied, matching the requirement to update both rows or neither.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In Azure SQL Database, if a transaction updates both the insurance and treatment plan rows, atomicity guarantees that either both updates are committed or both are rolled back, preventing partial updates that could leave data in an inconsistent state.

Exam trap

The trap here is that candidates confuse atomicity with consistency, thinking that consistency alone ensures all-or-nothing updates, but consistency only enforces rules and constraints—it is atomicity that provides the rollback mechanism to prevent partial transactions.

How to eliminate wrong answers

Option A is wrong because durability guarantees that committed transactions persist even after a system failure, but it does not control whether both updates succeed or fail together. Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, but it relies on atomicity to prevent partial updates that would violate integrity rules. Option C is wrong because isolation controls how concurrent transactions interact (e.g., preventing dirty reads), but it does not enforce the all-or-nothing behavior of a single transaction.

Full explanation →

190

MCQhard

Your company uses Azure Databricks to process streaming data from Event Hubs. The data is transformed and written to Azure Data Lake Storage Gen2 as Delta tables. You notice that some records are duplicated in the Delta tables. Which configuration change should you make to prevent duplicates?

A.Add a separate job to deduplicate the Delta table.

B.Enable checkpointing in the streaming query to store progress.

C.Use Delta Lake's idempotent write support.

D.Increase the batch interval in the streaming query.

AnswerB

Checkpointing ensures exactly-once processing.

Why this answer

Checkpointing in Spark Structured Streaming stores the offset of the last processed event from Event Hubs. When the query restarts, it reads from the checkpointed offset, ensuring each event is processed exactly once and preventing duplicates in the Delta table.

Exam trap

The trap here is that candidates confuse idempotent writes (which prevent duplicate writes within a single transaction) with checkpointing (which prevents duplicate reads across query restarts), leading them to choose Option C instead of B.

How to eliminate wrong answers

Option A is wrong because adding a separate deduplication job is an extra step that does not address the root cause of duplicate ingestion; it only cleans up after the fact, increasing complexity and cost. Option C is wrong because Delta Lake's idempotent write support prevents duplicate writes within the same transaction, but it does not handle duplicate reads from the streaming source; the duplication occurs because the streaming query reprocesses events from the beginning without checkpointing. Option D is wrong because increasing the batch interval only changes how often micro-batches are triggered; it does not track which events have already been processed, so duplicates can still occur on restarts.

Full explanation →

191

MCQmedium

A healthcare organization must build an analytics solution that processes streaming patient vitals data and provides real-time dashboards. The solution must also store historical data for compliance audits. Which combination of Azure services should the organization use?

A.Azure Stream Analytics for real-time processing and Azure SQL Database for historical storage and dashboards.

B.Azure Synapse Analytics for real-time processing and Azure Blob Storage for archival.

C.Azure Event Hubs for ingestion and Azure Data Lake Storage for storage, with Power BI for dashboards.

D.Azure HDInsight with Apache Spark for streaming and Azure Cosmos DB for storage.

AnswerA

Stream Analytics processes streaming data; SQL Database stores structured historical data for compliance and queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time processing of streaming data, such as patient vitals, and can output directly to Power BI for live dashboards. Azure SQL Database provides a relational store for historical data, supporting compliance audits with point-in-time restore and long-term retention. This combination meets both real-time and historical requirements without unnecessary complexity.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics as a streaming service due to its 'analytics' name, but it is primarily a data warehouse for batch and interactive queries, not for real-time stream processing.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is not designed for real-time stream processing; it is a data warehouse and analytics service for batch and interactive queries, not for low-latency streaming. Option C is wrong because while Azure Event Hubs and Azure Data Lake Storage are suitable for ingestion and storage, they lack built-in real-time processing; Power BI alone cannot process streaming data without a compute layer like Stream Analytics. Option D is wrong because Azure HDInsight with Apache Spark is a big data platform that can handle streaming, but it adds operational overhead and is not as straightforward for real-time dashboards as Stream Analytics; Azure Cosmos DB is a NoSQL database, not optimized for relational compliance audits.

Full explanation →

192

MCQmedium

A library management system uses Azure SQL Database. The Books table has 500,000 rows with columns: BookID (primary key, clustered), Title, Author, ISBN, PublishedYear, CopiesAvailable. Queries frequently filter by Author and then sort results by PublishedYear in descending order. Which indexing strategy will most improve query performance?

A.Create a nonclustered index on (Author) INCLUDE (PublishedYear, CopiesAvailable, Title).

B.Create a nonclustered index on (PublishedYear) INCLUDE (Author).

C.Create a nonclustered index on (Author, PublishedYear DESC) INCLUDE (CopiesAvailable, Title, ISBN).

D.Create a nonclustered columnstore index on (Author, PublishedYear, CopiesAvailable, Title, ISBN).

AnswerC

This index is sorted by Author first and then PublishedYear descending, perfectly supporting both the filter and the sort. Included columns make it covering.

Why this answer

Option C is correct because it creates a covering index that matches the exact query pattern: filtering by Author and sorting by PublishedYear in descending order. By defining (Author, PublishedYear DESC) as the index key, SQL Server can perform an index seek on Author and an ordered scan on PublishedYear without a separate sort operation. Including the remaining columns (CopiesAvailable, Title, ISBN) as non-key columns makes the index covering, eliminating the need for key lookups to the clustered index.

Exam trap

The trap here is that candidates often choose Option A because they think INCLUDING PublishedYear is sufficient for sorting, but they miss that the index key order must match the ORDER BY clause to avoid an explicit sort operation.

How to eliminate wrong answers

Option A is wrong because it includes PublishedYear only as an included column, not as a key column, so the database cannot use the index to satisfy the ORDER BY PublishedYear DESC clause without performing an explicit sort after the seek on Author. Option B is wrong because it places PublishedYear as the leading key column, which does not support efficient filtering on Author; the query would require a full index scan or a separate lookup for each Author value. Option D is wrong because a nonclustered columnstore index is optimized for large-scale analytical aggregations and scans, not for point lookups or ordered retrieval of a small subset of rows; it would introduce unnecessary overhead for this transactional query pattern.

Full explanation →

193

MCQmedium

A social networking application needs to store and query relationships between users, such as 'friends of friends' to recommend new connections. The application must traverse these relationships efficiently. Which Azure NoSQL data store and API should they choose?

A.Azure Cosmos DB with MongoDB API

B.Azure Cosmos DB with Gremlin API

C.Azure Table Storage

D.Azure Cosmos DB with SQL API

AnswerB

Correct. The Gremlin API is designed for graph data models and supports traversal queries using the Gremlin graph query language.

Why this answer

Azure Cosmos DB with Gremlin API is correct because it provides a graph database model specifically designed for storing and querying highly connected data, such as user relationships. The Gremlin API supports graph traversal queries (e.g., 'friends of friends') natively using the Apache TinkerPop graph traversal language, enabling efficient navigation of edges and vertices without expensive join operations.

Exam trap

The trap here is that candidates often confuse document databases (like MongoDB API or SQL API) with graph databases, assuming any NoSQL store can handle relationships efficiently, but only a dedicated graph database like Gremlin API provides native traversal operators for multi-hop queries.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB with MongoDB API is a document-oriented store that lacks native graph traversal capabilities; modeling 'friends of friends' would require multiple queries or application-level joins, which is inefficient. Option C is wrong because Azure Table Storage is a key-value store with no support for relationships or graph traversal; it cannot efficiently query multi-hop connections like 'friends of friends'. Option D is wrong because Azure Cosmos DB with SQL API is a document store using SQL-like queries, which does not provide native graph traversal operators; traversing relationships would require recursive queries or client-side logic, leading to poor performance.

Full explanation →

194

MCQhard

Refer to the exhibit. You have an Azure Data Factory pipeline definition as shown. The pipeline fails with a 'Source not found' error. The BlobInputDataset points to a container that exists. What is the most likely cause?

A.The Azure Blob Storage container is empty.

B.The Azure Data Factory managed identity does not have access to the storage account.

C.The SQL sink database does not exist.

D.The dataset's file path is incorrect or no files match the pattern.

AnswerD

Source not found typically means the file path is invalid or no files exist at that location.

Why this answer

The 'Source not found' error in Azure Data Factory indicates that the source dataset cannot locate the specified file or blob. Since the container exists, the most likely cause is that the file path defined in the dataset is incorrect or that no files match the specified pattern (e.g., wildcard or prefix). This is a common configuration issue when the dataset's folder path or file name does not correspond to the actual blob location.

Exam trap

The trap here is that candidates often confuse a missing file or incorrect path with an empty container or permission issues, but the specific 'Source not found' error points directly to the dataset's file path or pattern mismatch.

How to eliminate wrong answers

Option A is wrong because an empty container would not cause a 'Source not found' error; instead, a copy activity would succeed with zero rows copied, or a lookup activity would return an empty result. Option B is wrong because a managed identity access issue would result in an 'Authentication failed' or 'Authorization failed' error, not 'Source not found'. Option C is wrong because the SQL sink database not existing would cause a 'Sink not found' or connection error, not a source-related error.

Full explanation →

195

MCQmedium

A logistics company tracks package deliveries. When a package is scanned at a distribution center, the system immediately updates the delivery status in a database so customers can see the live tracking information. At the end of each day, the company runs a job that aggregates all delivery status changes into a report for operational analysis. Which of the following best describes these two data processing workloads?

A.Both are batch processing workloads.

B.The status update is a real-time workload, and the daily report is a batch workload.

C.Both are real-time processing workloads.

D.The status update is a batch workload, and the daily report is a real-time workload.

AnswerB

Correct. The status update is processed instantly (real-time), while the daily job processes data in batches (batch).

Why this answer

Option B is correct because the immediate status update upon scanning is a real-time workload, as it processes data instantly for live customer visibility. The end-of-day aggregation job is a batch workload, as it processes accumulated data in a scheduled, non-real-time manner for operational reporting.

Exam trap

The trap here is confusing the speed of the underlying database update with the processing pattern, leading candidates to assume that any database write is batch, or that any scheduled job is real-time, when the key distinction is whether the processing is triggered by each event or runs on a schedule.

How to eliminate wrong answers

Option A is wrong because it incorrectly labels both workloads as batch, ignoring the immediate, low-latency nature of the status update. Option C is wrong because it incorrectly labels both workloads as real-time, ignoring the scheduled, non-continuous nature of the daily aggregation report. Option D is wrong because it reverses the definitions, treating the immediate update as batch and the daily report as real-time, which contradicts the fundamental latency and processing patterns of each workload.

Full explanation →

196

MCQmedium

A mobile gaming company stores player activity logs as JSON documents. Each document has a unique ActivityID, a PlayerID, a timestamp, and a variable set of attributes depending on the game event (e.g., level started, item purchased). The application requires low-latency point reads by ActivityID and needs to query logs by PlayerID for a given time range. Schema flexibility is critical because new game events are added frequently. Which Azure Cosmos DB API should they choose?

A.NoSQL API (formerly SQL API)

B.MongoDB API

C.Cassandra API

D.Gremlin API

AnswerA

Correct. The NoSQL API provides native JSON support, flexible schema, automatic indexing, and SQL-like querying for efficient point reads and range queries.

Why this answer

The NoSQL API (formerly SQL API) is the correct choice because it natively supports JSON documents with flexible schemas, enabling the variable attributes required for new game events. It provides low-latency point reads by ActivityID via direct partition key lookups and supports efficient queries by PlayerID within a time range using composite indexes or cross-partition queries with filtering. This API is optimized for schema-agnostic, document-based workloads and offers the richest query capabilities for JSON data in Azure Cosmos DB.

Exam trap

The trap here is that candidates often choose the MongoDB API assuming it is the only option for JSON documents, but they overlook that the NoSQL API provides superior query flexibility and indexing for time-range queries, and that all Cosmos DB APIs support JSON documents but with different query capabilities.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because while it supports JSON-like documents with flexible schemas, its query language is limited to MongoDB's aggregation pipeline and does not natively support the same level of SQL-like querying for time-range filtering across partitions without additional indexing complexity; the NoSQL API provides more straightforward querying for this use case. Option C (Cassandra API) is wrong because it uses a wide-column store model with a fixed schema defined by CQL tables, which cannot accommodate the variable set of attributes in JSON documents without schema changes, and it lacks native support for JSON document storage and querying. Option D (Gremlin API) is wrong because it is designed for graph data models and traversals, not for document storage or point reads by ActivityID, and it cannot efficiently handle the flexible schema and time-range queries required for player activity logs.

Full explanation →

197

MCQmedium

You are designing a batch processing pipeline that runs nightly to transform CSV files from an FTP server into Parquet files in Azure Data Lake Storage. Which Azure service should you use to orchestrate the pipeline?

A.Azure Functions

B.Azure Data Factory

C.Azure Logic Apps

D.Azure Batch

AnswerB

ADF provides orchestration for batch data pipelines with transformations.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data integration service designed specifically for orchestrating and automating data pipelines. It supports scheduled triggers (e.g., nightly runs), native connectors for FTP and Azure Data Lake Storage, and built-in data transformation activities like Copy Data and Mapping Data Flows to convert CSV to Parquet. ADF's control flow and dependency management make it ideal for batch processing pipelines.

Exam trap

The trap here is that candidates confuse Azure Data Factory with Azure Logic Apps or Azure Functions, assuming any 'automation' or 'serverless' service can orchestrate a batch ETL pipeline, but only ADF provides the native data movement, transformation, and scheduling capabilities required for this specific scenario.

How to eliminate wrong answers

Option A is wrong because Azure Functions is a serverless compute service for event-driven, short-running code, not designed for orchestrating complex, scheduled batch pipelines with dependencies and data movement across heterogeneous sources. Option C is wrong because Azure Logic Apps is a low-code workflow automation service primarily for integrating SaaS applications and APIs, lacking native data transformation capabilities like CSV-to-Parquet conversion and optimized data movement for large-scale batch processing. Option D is wrong because Azure Batch is a job scheduling and compute management service for running large-scale parallel and high-performance computing (HPC) workloads, not a data orchestration tool with built-in connectors for FTP and Data Lake Storage.

Full explanation →

198

MCQeasy

A company needs to store semi-structured data from IoT devices, including temperature readings and device status. The data will be queried by time range and device ID. Which Azure data service is most cost-effective for this use case?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerD

Table Storage is a low-cost NoSQL store ideal for IoT telemetry.

Why this answer

Azure Table Storage is a NoSQL key-value store that is optimized for storing large amounts of semi-structured data without requiring a fixed schema. It supports efficient queries by partition key (device ID) and row key (timestamp), making it ideal for time-series IoT data at a lower cost than other Azure data services.

Exam trap

The trap here is that candidates often choose Azure Cosmos DB for its NoSQL capabilities, overlooking the fact that Table Storage provides the same key-value functionality at a fraction of the cost for simple IoT workloads.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is designed for unstructured binary or text data (e.g., images, logs, backups) and does not natively support indexed queries by device ID and time range without additional indexing or compute layers. Option B is wrong because Azure Cosmos DB, while capable of handling semi-structured data and time-series queries, is significantly more expensive than Table Storage for high-volume IoT data due to its provisioned throughput and multi-model capabilities. Option C is wrong because Azure SQL Database is a relational database that requires a fixed schema and is over-provisioned for simple key-value lookups, leading to higher cost and complexity for semi-structured IoT data.

Full explanation →

199

MCQhard

You are designing a solution to store and analyze large volumes of streaming data from social media feeds. The data is semi-structured (JSON) and will be used for real-time dashboards. You need to choose a storage solution that can handle high-ingestion throughput and support querying with Azure Synapse Serverless SQL. Which storage option should you choose?

A.Azure Table Storage

B.Azure Data Lake Storage Gen2

C.Azure Cosmos DB

D.Azure Cache for Redis

AnswerB

Optimized for big data analytics and compatible with Synapse Serverless SQL.

Why this answer

Option A is correct because Azure Data Lake Storage Gen2 (ADLS Gen2) is built on Blob Storage, supports high-throughput ingestion, and can be queried by Azure Synapse Serverless SQL. Option B is wrong because Azure Cosmos DB is optimized for transactional workloads, not analytical queries with Synapse Serverless SQL directly (though Synapse Link exists). Option C is wrong because Azure Table Storage is not designed for analytics.

Option D is wrong because Azure Cache for Redis is an in-memory cache.

Full explanation →

200

Multi-Selecthard

Which THREE factors should be considered when designing a relational database in Azure to minimize latency for globally distributed users?

Select 3 answers

A.Implement horizontal partitioning (sharding)

B.Use columnstore indexes

C.Configure read replicas

D.Use active geo-replication

E.Choose a service tier that provides higher IOPS

AnswersC, D, E

Read replicas offload read traffic and serve data closer to users, reducing latency.

Why this answer

Geo-replication provides read replicas in multiple regions. Read-replica configurations offload read traffic. Selecting a tier with higher IOPS ensures sufficient throughput.

Horizontal partitioning (sharding) adds complexity and may increase latency for cross-shard queries. Columnstore indexes are for analytics, not latency reduction.

Full explanation →

201

MCQeasy

A data engineer needs to transform and clean data from multiple sources before loading it into Azure Synapse Analytics. Which Azure service should they use for this ETL process?

A.Azure Analysis Services

B.Azure Databricks

C.Azure Data Factory

D.Azure Stream Analytics

AnswerC

Azure Data Factory provides visual data flows and orchestration for ETL workloads.

Why this answer

Azure Data Factory is the correct service because it is a cloud-based ETL (Extract, Transform, Load) service designed specifically to orchestrate and automate data movement and transformation from multiple sources. It provides built-in connectors for various data stores and supports data flows for cleaning and transforming data before loading it into Azure Synapse Analytics.

Exam trap

The trap here is that candidates often confuse Azure Databricks (a Spark-based analytics platform) with Azure Data Factory, but Data Factory is the dedicated ETL orchestration service, while Databricks is more for data engineering and machine learning workloads.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an analytical engine used for creating semantic models and performing business intelligence (BI) queries, not for ETL processes. Option B is wrong because Azure Databricks is a big data analytics platform based on Apache Spark, which can perform transformations but is not primarily an ETL orchestration service; it is more suited for advanced analytics and machine learning workloads. Option D is wrong because Azure Stream Analytics is a real-time event processing engine for streaming data, not designed for batch ETL from multiple static sources.

Full explanation →

202

MCQeasy

A company stores customer orders in a relational database. The database enforces rules that every order must have a unique order number and must be linked to an existing customer record. This enforcement of rules to ensure accuracy and consistency is an example of which data concept?

A.Data schema

B.Data integrity

C.Data redundancy

D.Data latency

AnswerB

Data integrity is maintained through constraints like primary keys and foreign keys, which enforce rules to keep data accurate and consistent.

Why this answer

Data integrity refers to the enforcement of rules that ensure the accuracy, consistency, and reliability of data throughout its lifecycle. In this scenario, the relational database enforces entity integrity (unique order numbers) and referential integrity (linking orders to existing customer records), which are core mechanisms for maintaining data correctness.

Exam trap

The trap here is that candidates often confuse 'data schema' (the structural definition) with 'data integrity' (the enforcement of rules), mistakenly thinking that simply having a schema guarantees data accuracy and consistency.

How to eliminate wrong answers

Option A is wrong because a data schema defines the structure of the database (tables, columns, relationships) but does not itself enforce rules like uniqueness or referential constraints; it is the blueprint, not the enforcement mechanism. Option C is wrong because data redundancy refers to the unnecessary duplication of data, which can lead to inconsistencies, not the enforcement of rules to ensure accuracy and consistency. Option D is wrong because data latency measures the delay between data creation and its availability for use, which is unrelated to rule enforcement for accuracy and consistency.

Full explanation →

203

MCQhard

A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?

A.Change distribution to replicated table

B.Change distribution to round-robin

C.Create a columnstore index

D.Change distribution to hash on 'WarehouseID'

AnswerD

Hash-distributing on WarehouseID places all rows for a specific warehouse on the same compute node. Queries that filter by WarehouseID become single-distribution queries, eliminating expensive data shuffles. This directly addresses the observed performance bottleneck.

Why this answer

D is correct because hash-distributing the 'Shipments' table on 'WarehouseID' ensures that all rows for a given warehouse are co-located on the same distribution node. This eliminates the need for data movement (shuffle) when queries filter on 'WarehouseID' and aggregate by 'Region', as the aggregation can be performed locally on each distribution without redistributing data across nodes.

Exam trap

The trap here is that candidates often confuse indexing (columnstore) with distribution design, assuming that a better index alone can fix shuffle-related performance issues, when in fact the distribution key is the primary factor determining data movement in a massively parallel processing (MPP) architecture.

How to eliminate wrong answers

Option A is wrong because replicated tables are suitable for small dimension tables (typically < 2 GB) and not for a 10 TB fact table like 'Shipments'; replicating such a large table would cause excessive storage overhead and degrade performance. Option B is wrong because round-robin distribution distributes data evenly without any logical grouping, so queries filtering on 'WarehouseID' would still require a full data shuffle to bring related rows together for aggregation. Option C is wrong because columnstore indexes are already the default for dedicated SQL pool tables and are designed for compression and scan performance, but they do not address the root cause of data movement across distributions caused by an inappropriate distribution key.

Full explanation →

204

MCQmedium

A manufacturing company needs to build an analytics solution for IoT sensor data. Thousands of devices send real-time temperature and vibration readings. The solution must: (1) ingest the streaming data reliably, (2) perform real-time aggregations (e.g., average temperature per device every minute), and (3) store the aggregated results in Azure Synapse Analytics for historical reporting and dashboards. Which combination of Azure services should be used?

A.Azure Event Hubs -> Azure Stream Analytics -> Azure Synapse Analytics

B.Azure IoT Hub -> Azure Data Factory -> Azure Cosmos DB

C.Azure Blob Storage -> Azure Databricks -> Azure SQL Database

D.Azure Service Bus -> Azure Functions -> Azure Table Storage

AnswerA

Correct. Event Hubs ingests the streaming data, Stream Analytics performs real-time transformations and aggregations, and Synapse Analytics stores the results for historical analysis and BI reporting.

Why this answer

Azure Event Hubs is designed for high-throughput, reliable ingestion of streaming data from millions of IoT devices. Azure Stream Analytics can then perform real-time aggregations (like average temperature per device per minute) using a SQL-like query language. Finally, Azure Synapse Analytics provides a dedicated SQL pool or serverless SQL endpoint for storing and querying the aggregated results, enabling historical reporting and dashboards.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub with Azure Event Hubs, thinking IoT Hub is required for all IoT scenarios, but Event Hubs is the correct choice for pure telemetry ingestion without device management needs.

How to eliminate wrong answers

Option B is wrong because Azure IoT Hub is primarily for device management and bi-directional communication, not optimized for high-scale streaming ingestion, and Azure Data Factory is a batch ETL tool, not a real-time stream processor; Azure Cosmos DB is a NoSQL database, not a data warehouse for historical reporting. Option C is wrong because Azure Blob Storage is for static file storage, not real-time streaming ingestion, and Azure Databricks is a big data analytics platform that can process streams but is not the simplest or most cost-effective choice for simple real-time aggregations; Azure SQL Database is a transactional database, not a large-scale analytics warehouse. Option D is wrong because Azure Service Bus is a message broker for enterprise messaging, not designed for high-throughput IoT telemetry, and Azure Functions is a serverless compute service that would require custom code for stream processing, lacking the built-in windowing and aggregation capabilities of Stream Analytics; Azure Table Storage is a NoSQL key-value store, not suitable for complex analytical queries.

Full explanation →

205

MCQeasy

A financial services company needs to store transaction data in a relational database that must comply with strict regulatory requirements. The data must be encrypted at rest and in transit, and the database must support automatic failover to a secondary region. Which Azure solution meets these requirements?

A.Azure Database for MySQL

B.Azure Cosmos DB

C.Azure SQL Database with Active Geo-Replication

D.SQL Server on Azure Virtual Machines

AnswerC

Provides TDE, SSL, and automatic geo-failover.

Why this answer

Option D is correct because Azure SQL Database with Active Geo-Replication provides encryption at rest (TDE), encryption in transit (SSL), and automatic failover to a secondary region. Option A (Azure Cosmos DB) is NoSQL. Option B (Azure Database for MySQL) supports encryption but not automatic geo-failover by default.

Option C (SQL Server on Azure VM) requires manual configuration for all these features.

Full explanation →

206

MCQmedium

A mobile gaming company stores player data in Azure Cosmos DB using the Core (SQL) API. Each document contains fields: playerId, nickname, score, level, and an inventory array of item objects (each with name and type). The company wants to query all players whose score is above 5000 and who have a specific item (e.g., a sword) in their inventory. Which query clause should they use?

A.A) WHERE c.score > 5000 AND c.inventory.some(item => item.name == 'sword')

B.B) WHERE c.score > 5000 AND ARRAY_CONTAINS(c.inventory, {name: 'sword'}, true)

C.C) WHERE c.score > 5000 AND c.inventory.name == 'sword'

D.D) WHERE c.score > 5000 AND 'sword' IN c.inventory

AnswerB

ARRAY_CONTAINS checks if the array contains an object that matches the specified properties. The third parameter 'true' enables partial matching, so it finds items where name equals 'sword' regardless of other fields.

Why this answer

Option B is correct because ARRAY_CONTAINS with the third parameter set to 'true' performs a partial match, checking if any element in the inventory array has a 'name' property equal to 'sword'. This is the standard way to query for an item within an array of objects in Azure Cosmos DB's SQL API, as it correctly handles the nested structure without requiring a JOIN or subquery.

Exam trap

The trap here is that candidates often confuse SQL array syntax (like IN or direct property access) with the specialized ARRAY_CONTAINS function required for querying arrays of objects in Cosmos DB, or they mistakenly apply JavaScript array methods that are not supported in the SQL API.

How to eliminate wrong answers

Option A is wrong because 'c.inventory.some(item => item.name == 'sword')' uses JavaScript-like syntax that is not valid in Azure Cosmos DB SQL API queries; the SQL API does not support arrow functions or the .some() method. Option C is wrong because 'c.inventory.name == 'sword'' assumes that 'name' is a direct property of the array, but arrays do not have a 'name' property; this would either return undefined or cause a query error. Option D is wrong because ''sword' IN c.inventory' checks if the string 'sword' is an exact element of the array, but the inventory array contains objects, not strings, so this condition will never be true.

Full explanation →

207

MCQeasy

A company stores customer order data in a relational database table with columns like OrderID, CustomerID, and OrderDate. They also store product images as JPEG files, and customer feedback as JSON documents with varying fields. Which of the following correctly orders these data types from most structured to least structured?

A.A: JSON documents, Relational table, JPEG files

B.B: Relational table, JSON documents, JPEG files

C.C: JPEG files, Relational table, JSON documents

D.D: Relational table, JPEG files, JSON documents

AnswerB

The relational table is structured, JSON is semi-structured, and JPEG is unstructured, so this is the correct descending order of structure.

Why this answer

Relational tables enforce a fixed schema with rows and columns, making them the most structured. JSON documents have a flexible schema with varying fields, placing them in the middle. JPEG files are binary blobs with no inherent structure for querying, making them the least structured.

Option B correctly orders these from most structured (relational table) to least structured (JPEG files).

Exam trap

The trap here is that candidates often confuse semi-structured data (JSON) with unstructured data (JPEG), incorrectly ranking JSON as less structured than binary files, or they forget that relational tables are the most structured due to their rigid schema enforcement.

How to eliminate wrong answers

Option A is wrong because it places JSON documents as more structured than relational tables, but JSON's flexible schema (allowing varying fields) is less structured than a fixed relational schema. Option C is wrong because it lists JPEG files as more structured than both relational tables and JSON documents, but JPEGs are unstructured binary data with no queryable schema. Option D is wrong because it places JPEG files as more structured than JSON documents, but JSON documents have a semi-structured format with key-value pairs and nesting, while JPEGs are entirely unstructured.

Full explanation →

208

MCQhard

A company stores user profile data in Azure Blob Storage as JSON files. Each file represents one user. They need to provide real-time search capabilities on user attributes like name, email, and location. The search must support partial matches and return results within 500 ms. The data volume is 10 TB and grows by 1 GB daily. They have a limited budget and want to minimize operational overhead. Which Azure solution should they choose?

A.Import the JSON files into Azure Cosmos DB and use the SQL API to search.

B.Use Azure Data Lake Analytics with U-SQL to query the files.

C.Use Azure Cognitive Search to index the JSON files and provide search capabilities.

D.Load the data into Azure SQL Database and create full-text indexes.

AnswerC

Cognitive Search is designed for full-text search, supports partial matches, and integrates with Blob Storage.

Why this answer

Azure Cognitive Search (option A) is a managed search service that indexes JSON data from Blob Storage and supports partial matches with low latency. Option B (Azure Cosmos DB) requires data migration and may not support partial match efficiently. Option C (Azure SQL Database) requires importing data and schema changes.

Option D (Azure Data Lake Analytics) is for batch processing, not real-time search.

Full explanation →

209

MCQmedium

A company has an on-premises SQL Server database that they want to migrate to Azure with minimal changes to the application. They need to use the least amount of administrative effort for patching and backups. Which Azure service should they choose?

A.Azure SQL Managed Instance

B.Azure SQL Database

C.SQL Server on Azure Virtual Machines

D.Azure Database for MySQL

AnswerB

Azure SQL Database is PaaS, fully managed with automated patching and backups.

Why this answer

Option D is correct because Azure SQL Database is a fully managed PaaS service that automates patching and backups. Option A is wrong because SQL Server on Azure VMs requires manual patching and backup management. Option B is wrong because Azure SQL Managed Instance also automates patching but is more complex than SQL Database.

Option C is wrong because Azure Database for MySQL is a different database engine.

Full explanation →

210

MCQmedium

A company stores user profiles as JSON documents. Each profile includes standard fields (userId, name, email) and optional fields (preferences, history). The application needs fast key lookups by userId and SQL-like queries on optional fields. Which Azure Cosmos DB API should they choose?

A.SQL (Core) API

B.MongoDB API

C.Cassandra API

D.Table API

AnswerA

This API natively stores JSON documents, supports flexible schema, and allows SQL-like queries on any field, meeting both fast key lookups and ad-hoc query needs.

Why this answer

The SQL (Core) API is the correct choice because it natively supports JSON documents with flexible schemas, enabling fast key-value lookups on the `userId` field (via automatic indexing) and rich SQL-like querying (e.g., `SELECT * FROM c WHERE c.preferences.theme = 'dark'`) on optional fields. It is the only Azure Cosmos DB API that provides a SQL query syntax directly over JSON, making it ideal for mixed workloads of point reads and ad-hoc queries on nested or optional properties.

Exam trap

The trap here is that candidates confuse the MongoDB API's support for JSON documents with the ability to run SQL queries, when in fact MongoDB uses its own query language and does not support SQL syntax, leading them to incorrectly choose MongoDB over the SQL (Core) API.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because, while it supports JSON-like documents and key lookups, its query language is MongoDB's query syntax (e.g., `find({preferences: {theme: 'dark'}})`) rather than SQL-like queries, and it does not offer native SQL syntax for the described requirement. Option C (Cassandra API) is wrong because it uses the Cassandra Query Language (CQL) and a wide-column storage model, which is not optimized for JSON documents with optional fields or SQL-like queries on nested properties. Option D (Table API) is wrong because it is designed for key-attribute store with a flat schema, not for JSON documents with nested optional fields, and its query capabilities are limited to OData filters and partition key lookups, not SQL-like queries on arbitrary JSON paths.

Full explanation →

211

MCQmedium

A data engineer needs to build an analytics solution to transform large volumes of streaming data from IoT devices. The transformations involve complex Python and Spark code, and the results will be stored in Azure Data Lake Storage Gen2 for further analysis. Which Azure service is best suited for executing these transformations?

A.Azure Data Factory

B.Azure Synapse Pipelines

C.Azure Databricks

D.Azure Analysis Services

AnswerC

Correct. Azure Databricks is a Spark-based analytics platform that allows complex transformations using Python, Scala, or SQL, and integrates well with Data Lake Storage.

Why this answer

Azure Databricks is best suited because it provides an Apache Spark-based analytics platform that can execute complex Python and Spark code on large-scale streaming data. It integrates natively with Azure Data Lake Storage Gen2 for reading streaming IoT data and writing transformed results, offering optimized performance for big data transformations.

Exam trap

The trap here is that candidates confuse Azure Data Factory or Synapse Pipelines with compute engines for code-based transformations, when those services are primarily for orchestration and integration, not for executing complex Python/Spark code on streaming data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an orchestration and ETL/ELT service that uses code-free pipelines or SQL-based transformations, not designed for executing complex Python and Spark code on streaming data. Option B is wrong because Azure Synapse Pipelines (now part of Synapse Analytics) focuses on data integration and orchestration with T-SQL or Spark notebooks, but it lacks the dedicated streaming and collaborative notebook environment that Azure Databricks provides for complex Spark transformations. Option D is wrong because Azure Analysis Services is a semantic modeling and business intelligence service for creating tabular models, not a compute engine for running Python or Spark code on streaming data.

Full explanation →

212

MCQmedium

A bank processes a fund transfer that involves deducting money from one account and crediting it to another. The system ensures that both operations succeed together or, if any part fails, the entire transaction is rolled back, leaving both accounts unchanged. Which ACID property does this scenario primarily guarantee?

A.Consistency

B.Isolation

C.Durability

D.Atomicity

AnswerD

Atomicity ensures that a transaction is an indivisible unit of work. If any part fails, the entire transaction is rolled back, leaving the data unchanged, perfectly matching the described scenario.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this fund transfer scenario, both the debit and credit operations must complete successfully, or the entire transaction is rolled back, leaving the accounts unchanged. This all-or-nothing behavior is the defining characteristic of atomicity in ACID transactions.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, mistakenly thinking that maintaining the total balance (consistency) is the same as the all-or-nothing execution of the transaction, but atomicity specifically focuses on the indivisibility of the transaction steps.

How to eliminate wrong answers

Option A is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving data integrity rules (e.g., total balance remains constant), but it does not guarantee the all-or-nothing execution of the individual operations. Option B is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not address the rollback of a failed multi-step operation. Option C is wrong because durability guarantees that once a transaction is committed, its changes persist even in the event of a system failure, but it does not apply to the rollback behavior described in the scenario.

Full explanation →

213

MCQhard

A company uses Azure SQL Database and wants to implement row-level security so that sales managers can only see data for their own region. Which feature should they use?

A.Dynamic Data Masking

B.Row-level security (RLS)

C.Transparent Data Encryption (TDE)

D.Microsoft Purview

AnswerB

RLS restricts which rows users can see based on group membership or context.

Why this answer

Row-level security (RLS) is the correct feature because it allows you to control access to rows in a database table based on the characteristics of the user executing a query. In this scenario, RLS can be implemented using a security policy and a predicate function that filters rows based on the sales manager's region, ensuring they only see data for their own region.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking (which hides data in results) with Row-level security (which filters rows), leading them to choose option A when the requirement is about restricting row visibility, not masking column values.

How to eliminate wrong answers

Option A is wrong because Dynamic Data Masking obfuscates data in query results (e.g., hiding parts of a credit card number) but does not restrict which rows are visible; it masks columns, not filters rows. Option C is wrong because Transparent Data Encryption (TDE) encrypts the database at rest and in transit but provides no row-level filtering or access control based on user identity. Option D is wrong because Microsoft Purview is a data governance and cataloging service for discovering and managing data assets, not a database-level security feature for filtering rows in queries.

Full explanation →

214

MCQmedium

A company needs to store relational data that requires frequent updates and supports complex joins. They also need to scale out reads by using read replicas. Which Azure service should they choose?

A.Azure Database for MySQL

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerA

Azure Database for MySQL is relational and supports read replicas.

Why this answer

Azure Database for MySQL is a fully managed relational database service that supports frequent updates and complex joins via SQL. It also offers read replicas, which allow scaling out read-heavy workloads by asynchronously replicating data from the primary server to up to five read-only replicas within the same region or cross-region.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's geo-replication (which is for disaster recovery, not read scaling) with read replicas, or they assume Cosmos DB supports relational joins because of its SQL API, overlooking its fundamental NoSQL architecture.

How to eliminate wrong answers

Option B (Azure Cosmos DB) is wrong because it is a NoSQL multi-model database that does not support complex SQL joins natively and uses a different consistency model; it is designed for globally distributed, schema-less data, not relational data with frequent updates and joins. Option C (Azure SQL Database) is wrong because while it supports relational data and complex joins, it does not natively support read replicas for scaling out reads; it uses geo-replication and failover groups for high availability, not read-scale replicas. Option D (Azure Table Storage) is wrong because it is a NoSQL key-value store that does not support relational schemas, complex joins, or read replicas; it is designed for semi-structured data at massive scale.

Full explanation →

215

MCQmedium

You need to store semi-structured JSON data from a web application and query it using SQL-like syntax. The solution must support high throughput with low latency. Which Azure data store should you use?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerB

Cosmos DB natively supports JSON documents and SQL-like queries.

Why this answer

Azure Cosmos DB is the correct choice because it natively supports semi-structured JSON documents and offers SQL-like querying via its core (SQL) API. It is designed for high throughput and low latency with guaranteed single-digit millisecond response times at the 99th percentile, making it ideal for web applications with demanding performance requirements.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files with the ability to query them using SQL syntax, overlooking that Blob Storage lacks a native query engine for semi-structured data.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage stores unstructured binary or text data and does not support SQL-like querying of JSON content without additional services like Azure Data Lake or serverless SQL pools. Option C is wrong because Azure SQL Database is a relational database that requires a fixed schema and is not optimized for semi-structured JSON data with high throughput and low latency at Cosmos DB's scale. Option D is wrong because Azure Table Storage is a NoSQL key-value store that does not support SQL-like query syntax and is designed for simple, schema-less data with lower throughput and higher latency compared to Cosmos DB.

Full explanation →

216

Multi-Selectmedium

Which TWO scenarios are appropriate for using Azure Blob Storage? (Choose two.)

Select 2 answers

A.Storing key-value pairs with partition and row keys.

B.Running SQL queries on structured data.

C.Storing JavaScript functions for server-side logic.

D.Storing backup files and archival data.

E.Storing images and videos for a website.

AnswersD, E

Blob Storage is commonly used for backups and archives.

Why this answer

Azure Blob Storage is designed for storing large amounts of unstructured data, such as binary files and text. Backup files and archival data are ideal use cases because Blob Storage supports hot, cool, and cold access tiers optimized for long-term retention and cost-effective storage. Additionally, storing images and videos for a website leverages Blob Storage's ability to serve static assets directly via HTTP/HTTPS, with built-in CDN integration for fast global delivery.

Exam trap

The trap here is that candidates confuse Azure Blob Storage with other Azure services that handle structured data (like Table Storage or SQL Database) or compute (like Azure Functions), leading them to select options that describe those services instead of focusing on unstructured data storage scenarios.

Full explanation →

217

MCQhard

A company runs a critical application on Azure SQL Database. They need to ensure that in the event of a regional disaster, the database can be failed over to a secondary region with minimal data loss. The solution should also allow the secondary region to be used for read-only queries. What should they configure?

A.Configure Active Geo-Replication and manually failover when needed.

B.Use Azure Front Door to route traffic to the secondary region.

C.Create a failover group with a readable secondary in the secondary region.

D.Migrate to Azure Cosmos DB for multi-region writes.

AnswerC

Failover groups provide automatic failover; readable secondary allows read queries.

Why this answer

Option A is correct because a failover group with a readable secondary provides automatic failover and read access. Option B (Active Geo-Replication) requires manual failover. Option C (Azure Front Door) is for web traffic.

Option D (Cosmos DB) is different.

Full explanation →

218

MCQmedium

A bank processes individual customer transactions in real-time to update account balances and also runs a nightly job that aggregates all daily transactions into summary reports for management. Which of the following best describes these two processing workloads?

A.OLTP for real-time transactions, OLAP for nightly reports

B.Batch processing for transactions, Stream processing for reports

C.OLAP for transactions, OLTP for reports

D.ETL for transactions, ELT for reports

AnswerA

Correct. OLTP is designed for high-volume transactional updates (real-time balance changes), while OLAP is designed for complex queries and aggregation (historical reports).

Why this answer

Option A is correct because real-time individual transaction processing is the hallmark of Online Transaction Processing (OLTP), which focuses on high-volume, low-latency inserts and updates to maintain current account balances. The nightly aggregation of daily transactions into summary reports is a classic Online Analytical Processing (OLAP) workload, which involves complex queries over large historical datasets for business intelligence. These two workloads have fundamentally different performance and design requirements, making OLTP and OLAP the appropriate classifications.

Exam trap

The trap here is that candidates confuse the terms 'batch' and 'stream' with OLTP and OLAP, or incorrectly assume that any nightly job is 'batch processing' and any real-time task is 'stream processing,' when the exam specifically tests the distinction between transactional and analytical workloads.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: real-time transactions are stream/OLTP processing, not batch, and nightly summary reports are batch/OLAP processing, not stream. Option C is wrong because it swaps the roles: OLAP is designed for analytical queries on aggregated data, not for high-frequency transactional updates, and OLTP is not suited for large-scale summary report generation. Option D is wrong because ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are data integration patterns used to move data between systems, not classifications of processing workloads; they describe how data is prepared, not the nature of the workload itself.

Full explanation →

219

MCQmedium

Your organization uses Azure Purview to scan data sources. You need to set up a scan rule set that automatically classifies credit card numbers in Azure SQL Database. Which built-in classification rule should you enable?

A.Use a regular expression pattern matching.

B.Create a custom classification rule.

C.Enable the 'Personally Identifiable Information (PII)' classification.

D.Enable the 'Credit Card Number' classification.

AnswerD

Built-in rule detects credit card patterns.

Why this answer

Option D is correct because Azure Purview includes a built-in 'Credit Card Number' classification rule that uses a predefined regular expression pattern to detect credit card numbers in data sources like Azure SQL Database. Enabling this rule automatically classifies the data without requiring custom development, aligning with the requirement to use a built-in classification.

Exam trap

The trap here is that candidates may confuse the method (regular expression pattern matching) with the specific built-in rule, or incorrectly assume that enabling a broader PII classification is sufficient when the question requires a targeted credit card number classification.

How to eliminate wrong answers

Option A is wrong because 'Use a regular expression pattern matching' is not a built-in classification rule in Azure Purview; it describes a method for creating custom rules, not a specific rule to enable. Option B is wrong because 'Create a custom classification rule' is unnecessary when a built-in rule for credit card numbers exists, and the question explicitly asks for a built-in rule to enable. Option C is wrong because 'Enable the 'Personally Identifiable Information (PII)' classification' is a broader category that may include credit card numbers but does not specifically target them; enabling it would classify all PII types, not just credit card numbers, which is not the precise requirement.

Full explanation →

220

MCQmedium

A startup is developing a web application that requires a relational database with PostgreSQL compatibility. They want a fully managed service that automatically handles backups, patching, and provides high availability with a 99.99% SLA. Which Azure service should they choose?

A.Azure Database for PostgreSQL

B.Azure SQL Database

C.Azure Database for MySQL

D.Azure Cosmos DB for PostgreSQL

AnswerA

Azure Database for PostgreSQL (Flexible Server) is a fully managed PostgreSQL service with automatic backups, patching, and zone-redundant high availability offering a 99.99% SLA. It is the ideal choice for a PostgreSQL-compatible relational database.

Why this answer

Azure Database for PostgreSQL is the correct choice because it is a fully managed relational database service that offers PostgreSQL compatibility, automatic backups, patching, and high availability with a 99.99% SLA. This meets the startup's requirements for a managed PostgreSQL solution without the need for manual administration.

Exam trap

The trap here is that candidates may confuse Azure Cosmos DB for PostgreSQL with Azure Database for PostgreSQL, not realizing that Cosmos DB for PostgreSQL is a distributed, globally replicated service optimized for large-scale workloads, not a standard fully managed relational database with automatic backups and patching.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it is based on Microsoft SQL Server, not PostgreSQL, and lacks PostgreSQL compatibility. Option C (Azure Database for MySQL) is wrong because it provides MySQL compatibility, not PostgreSQL, and does not support PostgreSQL-specific features or protocols. Option D (Azure Cosmos DB for PostgreSQL) is wrong because it is a distributed database service that combines Cosmos DB's global distribution with PostgreSQL compatibility, but it is not a fully managed relational database service with the same automatic backups, patching, and 99.99% SLA as Azure Database for PostgreSQL; it is designed for hyperscale and multi-region scenarios, not standard relational workloads.

Full explanation →

221

Multi-Selectmedium

Which TWO Azure services are appropriate for real-time data ingestion from IoT devices?

Select 2 answers

A.Azure IoT Hub

B.Azure Data Factory

C.Azure SQL Database

D.Azure Blob Storage

E.Azure Event Hubs

AnswersA, E

Specifically built for IoT device connectivity and ingestion.

Why this answer

Azure IoT Hub is designed specifically for bidirectional communication with IoT devices, supporting protocols like MQTT, AMQP, and HTTPS for real-time data ingestion. It provides per-device authentication, device management, and built-in message routing to downstream services, making it ideal for ingesting telemetry data from millions of devices in real time.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub (which provides device identity and management) with Azure Event Hubs (which is a generic event ingestion service), or mistakenly think that Azure Data Factory or Blob Storage can handle real-time IoT ingestion, when they are designed for batch or storage workloads respectively.

Full explanation →

222

MCQmedium

A company is migrating a 500 GB on-premises SQL Server database to Azure. The database uses SQL Server Agent for scheduled maintenance jobs and requires the ability to run cross-database queries within the same logical server. The company wants a PaaS service that minimizes management overhead for patching and backups while preserving these SQL Server features. Which Azure SQL service should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machine

D.Azure Database for MySQL

AnswerB

Azure SQL Managed Instance provides full SQL Server compatibility including SQL Server Agent and cross-database queries, with managed patching and backups.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including SQL Server Agent and cross-database queries within the same instance. As a PaaS service, it handles patching and backups automatically, minimizing management overhead while preserving these required features.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's 'logical server' with a true SQL Server instance, assuming it supports SQL Server Agent and cross-database queries, when in fact it does not.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a single database PaaS service that does not support SQL Server Agent or cross-database queries within the same logical server (it uses elastic queries with limitations). Option C is wrong because SQL Server on Azure Virtual Machine is an IaaS service that requires the customer to manage patching and backups, increasing management overhead. Option D is wrong because Azure Database for MySQL is a different database engine that does not support SQL Server Agent or cross-database queries in the same way, and it is not compatible with SQL Server features.

Full explanation →

223

MCQeasy

A database administrator is explaining to a colleague that a database transaction must ensure that either all operations within it succeed or none of them take effect. Which ACID property is being described?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures all-or-nothing execution of a transaction.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work: either all operations within it are committed successfully, or none are applied. This is the property that guarantees the 'all-or-nothing' behavior described in the question. In Azure SQL Database or SQL Server, atomicity is enforced through the transaction log and the write-ahead logging (WAL) protocol, which records changes before they are written to disk.

Exam trap

The trap here is that candidates often confuse Atomicity with Consistency, because both involve 'correctness' — but Atomicity is about the transaction's execution as a whole, while Consistency is about the database's adherence to rules after the transaction completes.

How to eliminate wrong answers

Option B is wrong because Consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers, cascades), but it does not guarantee the all-or-nothing outcome. Option C is wrong because Isolation controls how concurrent transactions are visible to each other (e.g., through locking or snapshot isolation), not whether a transaction's operations are applied as a unit. Option D is wrong because Durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via the transaction log being flushed to disk), not the atomic execution of the transaction's operations.

Full explanation →

224

MCQeasy

A research team needs to store thousands of PDF reports that vary in length and structure. The storage solution must allow flexible schema and support access from multiple programming languages via HTTP. Which data storage category best describes these reports?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerC

Unstructured data has no predefined structure and is stored as files (e.g., PDFs, images). Azure Blob Storage is a common choice for such data.

Why this answer

C is correct because PDF reports with varying length and structure are binary files that do not conform to a predefined data model or schema, which is the definition of unstructured data. Azure Blob Storage or Amazon S3 are typical services for storing such unstructured data, accessed via HTTP REST APIs from any programming language.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because PDFs can contain text and metadata, but the exam expects you to recognize that the file itself is a binary blob with no schema enforced by the storage system.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid schema (e.g., tables with rows and columns in a relational database), but PDFs have no fixed schema. Option B is wrong because semi-structured data (e.g., JSON, XML) has tags or key-value pairs that provide some organizational metadata, whereas PDFs are binary blobs without such inherent structure. Option D is wrong because transactional data refers to records of business transactions (e.g., sales orders) that are typically structured and require ACID compliance, not binary documents.

Full explanation →

225

MCQmedium

A hospital collects patient data from multiple sources. Source A stores patient vitals as a continuous stream of readings from wearable devices. Source B stores historical medical records in a relational database with fixed columns (PatientID, Diagnosis, AdmissionDate). Source C stores doctor's notes as unstructured text files. Which statement correctly describes the structure of data from these sources?

A.Source A is semi-structured, Source B is structured, Source C is unstructured.

B.Source A is structured, Source B is structured, Source C is unstructured.

C.Source A is structured, Source B is unstructured, Source C is semi-structured.

D.Source A is semi-structured, Source B is semi-structured, Source C is unstructured.

AnswerB

Both Source A (vitals readings with a fixed schema) and Source B (relational database) are structured data. Source C (unstructured text) is unstructured.

Why this answer

Source A stores patient vitals as a continuous stream from wearable devices, which is structured data because it typically consists of time-stamped numeric readings with a fixed schema (e.g., timestamp, heart rate, blood pressure). Source B uses a relational database with fixed columns (PatientID, Diagnosis, AdmissionDate), which is classic structured data. Source C contains unstructured text files (doctor's notes) with no predefined schema.

Therefore, Option B correctly identifies all three sources.

Exam trap

The trap here is that candidates often confuse a continuous data stream (Source A) with semi-structured data, but in DP-900, a stream of fixed-format sensor readings is considered structured because it has a consistent schema (e.g., timestamp and numeric values), not because it arrives in real time.

How to eliminate wrong answers

Option A is wrong because it labels Source A as semi-structured, but a continuous stream of numeric vitals from wearable devices is structured (fixed schema of timestamp and numeric values), not semi-structured (which would require tags or markers like JSON/XML). Option C is wrong because it calls Source B unstructured, but a relational database with fixed columns is the definition of structured data, not unstructured. Option D is wrong because it labels Source A as semi-structured (should be structured) and Source B as semi-structured (should be structured), while correctly identifying Source C as unstructured.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 151–225