Knowledge + Practice

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 1–75

982 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 14

1

MCQmedium

A company uses Azure Stream Analytics to process IoT data from thousands of devices. They need to store the results in a way that supports fast querying for historical analysis. Which output sink should they use?

A.Azure Table Storage

B.Azure Blob Storage

C.Azure Data Lake Storage Gen2

D.Azure Event Hubs

AnswerC

Designed for big data analytics and fast querying.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct output sink because it combines a hierarchical namespace with Azure Blob Storage's scalable object storage, enabling fast querying for historical analysis via tools like Azure Synapse Analytics, PolyBase, or Apache Spark. ADLS Gen2 supports high-throughput writes from Stream Analytics and allows efficient directory-level operations and fine-grained access control, which are critical for large-scale IoT data analytics.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage with ADLS Gen2, assuming both are equivalent for analytics, but the key differentiator is the hierarchical namespace and native integration with big data analytics engines that ADLS Gen2 provides.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store optimized for fast point lookups and small data volumes, not for complex historical queries or large-scale analytical workloads. Option B is wrong because Azure Blob Storage lacks a hierarchical namespace, making directory-level operations and fast querying for historical analysis less efficient compared to ADLS Gen2, and it does not natively support the same level of integration with analytics engines. Option D is wrong because Azure Event Hubs is a real-time data ingestion service, not a storage sink for historical analysis; it is designed for streaming data capture and event processing, not for long-term storage and querying.

Full explanation →

2

Multi-Selectmedium

Which TWO Azure services can be used to build a real-time analytics solution that ingests streaming data and provides dashboards with low latency? (Choose two.)

Select 2 answers

A.Power BI

B.Azure Data Factory

C.Azure Data Lake Storage Gen2

D.Azure HDInsight with Spark

E.Azure Stream Analytics

AnswersA, E

Provides real-time dashboards via DirectQuery or streaming datasets.

Why this answer

Power BI is correct because it can connect to real-time data sources like Azure Stream Analytics or Event Hubs to create streaming datasets and dashboards that update automatically with low latency. It supports push datasets and real-time tiles, enabling live visualization of streaming data without batch processing delays.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Data Factory or storage-only services like Data Lake Storage Gen2 with real-time analytics capabilities, or they assume HDInsight with Spark alone provides built-in dashboards, when in fact it requires a separate visualization layer.

Full explanation →

3

MCQeasy

A company needs to ensure that their Azure SQL Database is accessible only from a specific virtual network and deny access from public endpoints. Which feature should they configure?

A.Disable public network access

B.Virtual network service endpoints and firewall rules

C.IP firewall rules

D.Azure Private Link

AnswerB

Service endpoints allow VNet-specific access while denying public endpoints.

Why this answer

Option C is correct because virtual network service endpoints and firewall rules allow restricting access to a specific VNet. Option A is wrong because Private Link provides private connectivity but is more complex. Option B is wrong because Azure SQL Database always has a public endpoint by default; disabling it without Private Link would block all access.

Option D is wrong because IP firewall rules allow public IPs, not VNet-specific.

Full explanation →

4

MCQeasy

A retail company stores three types of customer data: (1) a table with columns for CustomerID, Name, and Email; (2) product reviews as JSON documents with varying fields such as rating and comment; (3) product demonstration videos stored in MP4 format. Which of the following correctly classifies these data types in order from first to third?

A.Structured, unstructured, semi-structured

B.Semi-structured, structured, unstructured

C.Structured, semi-structured, unstructured

D.Unstructured, semi-structured, structured

AnswerC

Correct. The table is structured (fixed schema), JSON documents are semi-structured (flexible schema), and videos are unstructured (no schema).

Why this answer

Option C is correct because the customer table with fixed columns (CustomerID, Name, Email) is structured data, product reviews as JSON documents with varying fields are semi-structured data (schema-on-read, flexible fields), and MP4 video files are unstructured data (no schema, binary format). This ordering matches the standard classification in Azure Data Fundamentals: structured → semi-structured → unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON) with unstructured data, or assume all non-tabular data is unstructured, when in fact JSON's key-value pairs with varying fields make it semi-structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly places unstructured before semi-structured; product reviews as JSON are semi-structured, not unstructured, and MP4 videos are unstructured, not semi-structured. Option B is wrong because it starts with semi-structured for the customer table, which is clearly structured with a fixed schema; it also misorders the remaining types. Option D is wrong because it begins with unstructured for the customer table, which is structured, and then places semi-structured before structured, reversing the correct order.

Full explanation →

5

MCQhard

A company uses Azure Data Lake Storage Gen2 as a data lake. They need to enforce row-level security for sensitive data so that sales representatives can only see rows for their assigned region. Which approach should they use?

A.Apply sensitivity labels in Microsoft Purview

B.Load data into Azure Synapse Analytics dedicated SQL pool and implement row-level security (RLS)

C.Use Azure RBAC roles on the storage account

D.Use Azure Data Lake Storage Gen2 access control lists (ACLs) on folders per region

AnswerB

Synapse SQL pool supports RLS to restrict row-level access based on user attributes.

Why this answer

Row-level security (RLS) in Azure Synapse Analytics dedicated SQL pool allows you to restrict data access at the row level based on a user's identity or group membership. By loading the data into a dedicated SQL pool and defining a security policy with a predicate function that filters rows by region, you can ensure sales representatives only see rows for their assigned region. This is the correct approach because RLS is designed specifically for this purpose and integrates with Azure Active Directory for user authentication.

Exam trap

The trap here is that candidates confuse storage-level access controls (ACLs, RBAC) with data-level security (RLS), assuming that folder-per-region ACLs can achieve row-level filtering, but ACLs cannot filter rows within a file.

How to eliminate wrong answers

Option A is wrong because sensitivity labels in Microsoft Purview classify and protect data at the file or column level, but they do not enforce row-level filtering based on user identity. Option C is wrong because Azure RBAC roles control access to the storage account itself (e.g., read/write permissions), not row-level visibility within a dataset. Option D is wrong because Azure Data Lake Storage Gen2 ACLs provide file- or folder-level permissions, not row-level filtering; they cannot restrict which rows a user sees within a file.

Full explanation →

6

MCQmedium

A company runs a SQL Server database on an Azure virtual machine. They need to offload reporting queries to a read-only copy without modifying the application. Which Azure service should they use?

A.Azure Analysis Services

B.Azure SQL Managed Instance

C.Azure SQL Database with read scale-out

D.Azure Synapse Analytics dedicated SQL pool

AnswerC

Azure SQL Database's read scale-out feature allows directing reporting queries to a read-only replica automatically.

Why this answer

Azure SQL Database with read scale-out (C) is correct because it creates a read-only replica of the database that can handle reporting queries without modifying the application. The application simply adds `ApplicationIntent=ReadOnly` to the connection string, and the Azure gateway automatically routes read-only queries to the secondary replica, offloading the primary from reporting workloads.

Exam trap

The trap here is that candidates confuse read scale-out with Azure SQL Managed Instance or Azure Analysis Services, assuming any read-only copy or analytics service can serve as a transparent offload, but only Azure SQL Database with read scale-out provides automatic, connection-string-based routing without application changes.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a semantic modeling and analytics engine that requires data to be imported or queried via DAX/MDX, not a read-only copy of a SQL Server database, and it cannot be used as a transparent read-only replica for existing SQL queries. Option B is wrong because Azure SQL Managed Instance is a fully managed SQL Server instance that does not support read scale-out; it offers read-only replicas only via failover groups, which require manual redirection and are not transparent to the application. Option D is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse designed for large-scale analytics, not a read-only copy of a SQL Server database, and it cannot be used to offload reporting queries without modifying the application or data pipeline.

Full explanation →

7

MCQhard

Refer to the exhibit. You create an Azure Policy to deny virtual machines that are not using approved SKUs. After assigning the policy to a subscription, a user tries to create a VM with SKU 'Standard_DS2_v2' and the creation is allowed. What is the most likely reason?

A.The policy is not assigned to the resource group where the VM is created.

B.The field type 'Microsoft.Compute/virtualMachines' is incorrect.

C.The alias 'Microsoft.Compute/virtualMachines/sku.name' is incorrect; the correct alias is 'Microsoft.Compute/virtualMachines/sku'.

D.The policy rule does not specify a deny effect.

AnswerC

The correct alias for VM SKU is 'Microsoft.Compute/virtualMachines/sku.name', but it is case-sensitive and must match the exact property path; however, the exhibit uses 'sku.name' which is correct. Actually the issue is that the property is 'hardwareProfile.vmSize', not 'sku.name'. So the alias is wrong.

Why this answer

Option C is correct because the alias 'Microsoft.Compute/virtualMachines/sku.name' is invalid; the correct alias for the SKU property of a virtual machine is 'Microsoft.Compute/virtualMachines/sku'. Azure Policy aliases must exactly match the ARM resource property path. Using an incorrect alias means the policy rule never evaluates the intended property, so the deny effect never triggers, allowing any SKU to be created.

Exam trap

The trap here is that candidates assume the alias must include the property's child field (like 'name') because they think of the SKU as an object with sub-properties, but Azure Policy aliases for simple string values do not include child fields.

How to eliminate wrong answers

Option A is wrong because the policy is assigned to the subscription, which covers all resource groups within that subscription by default; the VM creation is allowed due to a policy rule issue, not a scope issue. Option B is wrong because 'Microsoft.Compute/virtualMachines' is the correct resource type for Azure virtual machines; the field type is not the cause of the policy not enforcing. Option D is wrong because the question states the policy is created to 'deny' VMs, and the exhibit would show a deny effect; if the effect were missing, the policy would not deny, but the core problem is the incorrect alias preventing the condition from matching.

Full explanation →

8

MCQeasy

A logistics company collects data from fleet sensors. Each sensor sends a JSON message containing the vehicle ID, timestamp, and a variable set of measurements such as engine temperature, tire pressure, and fuel level. The structure of the JSON message differs between sensor types and sometimes includes optional fields. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Correct. JSON data with optional fields and varying structure is a classic example of semi-structured data, which uses tags or keys to organize data without a rigid schema.

Why this answer

The data is semi-structured because it conforms to a schema (JSON format with fields like vehicle ID and timestamp) but allows variability in structure, such as optional fields and different sets of measurements per sensor type. This flexibility is a hallmark of semi-structured data, which does not require a rigid tabular schema like structured data but still contains tags or markers to separate data elements.

Exam trap

The trap here is that candidates see 'JSON' and assume it is structured data because JSON has keys and values, but they miss that the variable and optional fields make it semi-structured, not strictly structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with consistent fields and data types, typically stored in relational tables, whereas the JSON messages here have variable and optional fields. Option C is wrong because unstructured data has no predefined structure or schema, such as raw video or text files, but these JSON messages have a defined format with key-value pairs. Option D is wrong because relational data is a subset of structured data that is organized into tables with rows and columns and enforces relationships via foreign keys, which does not apply to the flexible JSON messages.

Full explanation →

9

MCQmedium

A company has an on-premises SQL Server database with a 1 TB 'Sales' table containing historical data. They want to move this table to Azure SQL Database with minimal downtime. The table is actively written to during business hours. Which approach should they use?

A.Use Azure Data Migration Service with continuous sync from on-premises to Azure SQL Database, then cut over

B.Use the Azure SQL Database migration wizard to perform an offline migration over the weekend

C.Export the table as a .bacpac file and import it into Azure SQL Database during off-hours

D.Use SQL Server Management Studio's 'Deploy Database to Azure SQL Database' wizard

AnswerA

Correct. This online approach replicates changes continuously, allowing a short cutover window with minimal downtime.

Why this answer

Azure Data Migration Service (DMS) with continuous sync is the correct approach because it supports online migration with minimal downtime. It uses transactional replication to keep the on-premises SQL Server database synchronized with Azure SQL Database while the source remains fully operational, allowing a controlled cutover with only seconds of downtime.

Exam trap

The trap here is that candidates often assume offline methods (bacpac, wizard) are sufficient for large tables, underestimating the downtime required for a 1 TB dataset, and fail to recognize that 'minimal downtime' explicitly requires an online migration with continuous sync.

How to eliminate wrong answers

Option B is wrong because the Azure SQL Database migration wizard performs an offline migration, which requires the source database to be quiesced (no active writes) during the entire migration process, causing significant downtime. Option C is wrong because exporting a 1 TB table as a .bacpac file and importing it is an offline operation that locks the table, and the file transfer and import can take hours, resulting in extended downtime. Option D is wrong because the 'Deploy Database to Azure SQL Database' wizard in SSMS is designed for small databases (typically under 1 GB) and performs an offline migration, making it unsuitable for a 1 TB table with active writes.

Full explanation →

10

MCQmedium

A logistics company uses Azure SQL Database to store millions of shipment records. The table has columns: ShipmentID (primary key), CustomerID, ShipDate, and Destination. Queries frequently filter by CustomerID and ShipDate to retrieve shipments for a specific customer over a date range. Which indexing strategy will most improve query performance?

A.Create a nonclustered index on CustomerID and ShipDate

B.Create a clustered index on ShipmentID

C.Partition the table by ShipmentID

D.Create a full-text index on Destination

AnswerA

This composite index covers both filter columns, enabling efficient seek operations for the WHERE clause conditions.

Why this answer

A nonclustered index on CustomerID and ShipDate is the best choice because it directly supports the frequent query pattern filtering by both columns. This composite index allows SQL Database to perform an index seek rather than a full table scan, drastically reducing I/O for selective queries over millions of rows.

Exam trap

The trap here is that candidates often assume a clustered index on the primary key is always optimal, but for queries that filter on non-key columns, a covering nonclustered index is far more effective.

How to eliminate wrong answers

Option B is wrong because a clustered index on ShipmentID (the primary key) is already the default and does not help queries filtering by CustomerID and ShipDate; it would still require a full scan or key lookup. Option C is wrong because partitioning by ShipmentID does not improve performance for range queries on ShipDate and CustomerID; partitioning is primarily for data management and maintenance, not for accelerating selective queries. Option D is wrong because a full-text index on Destination is designed for text search and linguistic queries, not for equality or range filtering on CustomerID and ShipDate.

Full explanation →

11

MCQhard

A company uses Azure Data Lake Storage Gen2 for a data lake. They implement a folder structure with access control lists (ACLs). A new data scientist needs to read data from a specific folder but not write to it. Which ACL permission should be assigned?

A.Execute

B.Modify

C.Write

D.Read

AnswerA

Execute on a folder allows traversing; combined with Read on files allows reading data.

Why this answer

Execute (X) permission on a folder in Azure Data Lake Storage Gen2 is required to traverse the folder and access its contents. Without Execute, a user cannot list or read files inside the folder, even if Read permission is granted. Since the data scientist only needs to read data (not write), assigning Execute on the folder and Read on the files allows traversal and read access without write capability.

Exam trap

The trap here is that candidates often assume Read permission on a folder is sufficient to read its contents, but without Execute permission, the folder cannot be traversed, making the data inaccessible.

How to eliminate wrong answers

Option B (Modify) is wrong because Modify includes Write and Delete permissions, which would allow the data scientist to create, update, or delete files in the folder, violating the requirement to prevent writes. Option C (Write) is wrong because Write permission allows creating and modifying files in the folder, which is explicitly not allowed. Option D (Read) is wrong because Read on a folder alone does not grant the ability to traverse the folder hierarchy; without Execute, the data scientist cannot list or access files within the folder, making Read ineffective for reading data.

Full explanation →

12

Drag & Dropmedium

A data engineering team wants to build a batch analytics pipeline. The raw data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The final output will be a set of tables in Azure Synapse Analytics (dedicated SQL pool) that will be used to create reports in Power BI. Arrange the following steps in the correct order for a typical ETL process.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

ETL follows the order: extract raw data to ADLS, transform it, load into Synapse staging, then transform within Synapse to final tables, and finally use Power BI for reporting.

Full explanation →

13

Multi-Selecthard

Which THREE of the following are features of Azure SQL Database that help ensure high availability and disaster recovery?

Select 3 answers

A.Query Store

B.Zone-redundant databases

C.Active geo-replication

D.Read scale-out

E.Automatic failover groups

AnswersB, C, E

Replicates databases across availability zones within a region.

Why this answer

Options A, B, and E are correct. Active geo-replication provides disaster recovery to a secondary region. Automatic failover groups enable automatic failover to a secondary.

Zone-redundant databases provide resilience within a region. Option C is wrong because read scale-out is for performance, not HA/DR. Option D is wrong because Query Store is for performance tuning.

Full explanation →

14

MCQeasy

A company stores customer information in a SQL database with fixed columns (CustomerID, Name, Email). They also store scanned PDF contracts and product images in a file storage system. Which statement correctly describes the types of data mentioned?

A.Both the customer information and the files are structured data.

B.The customer information is semi-structured, and the files are unstructured.

C.The customer information is structured, and the files are unstructured.

D.Both the customer information and the files are unstructured.

AnswerC

Correct. Customer information in a SQL table with a fixed schema is structured data. PDFs and images lack a predefined schema, making them unstructured.

Why this answer

Customer information stored in fixed columns (CustomerID, Name, Email) follows a strict schema with defined data types and relationships, making it structured data. Scanned PDF contracts and product images are binary files with no inherent schema or organization, fitting the definition of unstructured data. Option C correctly pairs these classifications.

Exam trap

The trap here is that candidates confuse 'semi-structured' (e.g., JSON with flexible fields) with structured data (fixed schema), or assume all digital files are structured because they have metadata, ignoring the lack of a predefined schema in the content itself.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies the files as structured data; scanned PDFs and images are binary blobs without a predefined schema. Option B is wrong because it mislabels the customer information as semi-structured; fixed-column SQL tables with strict schemas are structured, not semi-structured (which would use flexible formats like JSON or XML). Option D is wrong because it incorrectly classifies the customer information as unstructured; the fixed-column SQL database enforces a rigid schema, making it structured data.

Full explanation →

15

MCQhard

Your organization uses Azure Data Lake Storage Gen2 as a data lake. You need to enforce data retention policies automatically, such as deleting files older than 90 days. Which Azure feature should you use?

A.Azure Policy

B.Azure Blob Storage lifecycle management

C.Azure Data Factory

D.Azure RBAC

AnswerB

Lifecycle management policies automate deletion or tiering of blobs based on age.

Why this answer

Azure Blob Storage lifecycle management allows you to define rules that automatically delete or tier blobs based on age. Since Azure Data Lake Storage Gen2 is built on top of Azure Blob Storage, you can use lifecycle management policies to delete files older than 90 days by setting a 'Delete blob' action with a 'daysAfterModificationGreaterThan' filter of 90.

Exam trap

The trap here is that candidates may confuse Azure Policy (which enforces rules on resource configurations) with data lifecycle management (which manages data within storage), or think Azure Data Factory is needed for scheduled deletion, when Azure Blob Storage lifecycle management is the native, policy-driven solution.

How to eliminate wrong answers

Option A is wrong because Azure Policy is used to enforce organizational standards and compliance by evaluating resource configurations (e.g., requiring encryption), not to manage data retention or automate deletion of files based on age. Option C is wrong because Azure Data Factory is an ETL and data orchestration service that can move or transform data, but it is not designed for automated, policy-based lifecycle management like deleting old files; you would need custom pipelines and triggers to mimic this, which is less efficient and not the intended use. Option D is wrong because Azure RBAC controls access permissions to resources (who can read/write/delete), not automated data retention or deletion based on time.

Full explanation →

16

MCQeasy

A developer is designing a new application that requires a relational database. The database must support complex queries and stored procedures. Which Azure data service should they choose?

A.Azure SQL Database

B.Azure Table Storage

C.Azure Cosmos DB

D.Azure Blob Storage

AnswerA

Azure SQL Database is a relational database that supports complex queries and stored procedures.

Why this answer

Option B is correct because Azure SQL Database is a fully managed relational database that supports complex queries and stored procedures. Option A is wrong because Azure Cosmos DB is NoSQL. Option C is wrong because Azure Table Storage is NoSQL key-value.

Option D is wrong because Azure Blob Storage is object storage.

Full explanation →

17

MCQeasy

A retail company captures real-time clickstream data from its website. They need to store this data for immediate analysis using KQL. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Cosmos DB

C.Azure Data Explorer

D.Azure SQL Database

AnswerC

Azure Data Explorer is designed for real-time analytics on streaming data with KQL support.

Why this answer

Azure Data Explorer (ADX) is optimized for interactive analytics on large volumes of streaming and high-velocity data, supporting Kusto Query Language (KQL) for real-time queries. It ingests clickstream data with low latency and provides immediate analysis capabilities, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a processing service) with Azure Data Explorer (a storage and query service), but the question specifically requires storing data for immediate KQL analysis, which Stream Analytics cannot do natively.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks like Azure Data Explorer or Power BI, but it does not natively support KQL for querying stored data. Option B is wrong because Azure Cosmos DB is a NoSQL database designed for transactional workloads with low-latency reads/writes, not for ad-hoc analytical queries using KQL. Option D is wrong because Azure SQL Database is a relational database optimized for OLTP and structured queries with T-SQL, not for high-velocity streaming data analysis with KQL.

Full explanation →

18

MCQeasy

A startup is migrating its on-premises SQL Server database to Azure. They want the least administrative overhead for patching and backups. Which Azure data service should they choose?

A.Azure Database for PostgreSQL

B.SQL Server on Azure Virtual Machines

C.Azure SQL Database

D.Azure Cosmos DB

AnswerC

PaaS provides automated patching, backups, and built-in high availability.

Why this answer

Option B is correct because Azure SQL Database is a fully managed platform-as-a-service (PaaS) that automates patching, backups, and high availability. Option A is wrong because SQL Server on Azure Virtual Machines (IaaS) requires manual patching and backup management. Option C is wrong because Azure Database for PostgreSQL is a different database engine, not SQL Server.

Option D is wrong because Azure Cosmos DB is a NoSQL database, not relational.

Full explanation →

19

MCQeasy

A retail company processes historical sales data in a nightly batch job that loads aggregated reports into a data warehouse. Additionally, the company analyzes live customer interactions from their website to provide real-time product recommendations. Which pair of terms correctly describes these two data processing approaches?

A.OLTP and OLAP

B.Batch processing and streaming processing

C.Structured data and unstructured data

D.Relational and NoSQL

AnswerB

Batch processing handles data in fixed, scheduled intervals (nightly reports). Streaming processing handles data continuously as it arrives (real-time recommendations). This pair correctly describes the two approaches.

Why this answer

The nightly batch job that loads aggregated reports into a data warehouse is a classic example of batch processing, where data is processed in large, scheduled chunks. The real-time analysis of live customer interactions for product recommendations is streaming processing, which handles data continuously as it arrives. Option B correctly pairs these two distinct processing paradigms.

Exam trap

The trap here is that candidates confuse OLTP/OLAP (which describe transactional vs. analytical workloads) with processing methods (batch vs. streaming), leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because OLTP (Online Transaction Processing) is designed for high-volume, low-latency transactional operations (e.g., order entry), not for nightly batch reporting, and OLAP (Online Analytical Processing) is a storage/query architecture for analytics, not a processing approach. Option C is wrong because structured data (e.g., tables) and unstructured data (e.g., text) describe data formats, not processing methods like batch or streaming. Option D is wrong because relational and NoSQL refer to database types (schema-based vs. flexible schema), not to how data is processed over time.

Full explanation →

20

MCQmedium

A data analyst needs to create a Power BI report that combines sales data from Azure SQL Database and marketing data from a CSV file stored in Azure Blob Storage. The report should refresh automatically. What is the recommended approach?

A.Use Power Query in Power BI Desktop to combine the data and publish with scheduled refresh

B.Export the SQL data to Excel and combine with the CSV in Power BI

C.Use Azure Data Factory to merge the data into a single SQL table

D.Use DirectQuery mode from Power BI for both sources

AnswerA

Power Query can combine both sources and scheduled refresh updates the report.

Why this answer

Using Power Query in Power BI Desktop to combine both sources and then publishing to the Power BI service with scheduled refresh is the standard approach. Option A (Azure Data Factory) is overkill for this simple combination. Option B (DirectQuery) cannot combine a CSV file easily.

Option D (Excel) is not recommended for automation.

Full explanation →

21

MCQhard

An organization has a large dataset stored in Azure Blob Storage. They need to run complex analytics using SQL queries and also want to use the same data for machine learning models. Which Azure service provides both SQL-based analytics and native integration with ML frameworks?

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Analysis Services

D.Azure SQL Database

AnswerB

Unified analytics platform with SQL and ML.

Why this answer

Azure Synapse Analytics is correct because it provides a unified analytics platform that combines enterprise data warehousing with big data analytics. It offers built-in SQL-based querying via dedicated SQL pools or serverless SQL pools, and it natively integrates with machine learning frameworks like Apache Spark and Azure Machine Learning for building and training models on the same data stored in Azure Blob Storage.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics with Azure SQL Database or Azure Data Factory, mistakenly thinking a traditional database or an ETL tool can handle both complex SQL analytics and native ML integration on large-scale Blob Storage data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a data integration and orchestration service, not a SQL analytics engine; it cannot run SQL queries directly on data. Option C is wrong because Azure Analysis Services is an OLAP engine for semantic models and business intelligence, not designed for complex SQL analytics or native ML framework integration. Option D is wrong because Azure SQL Database is a relational database for transactional workloads, not optimized for large-scale analytics on Blob Storage data and lacks native integration with ML frameworks like Spark.

Full explanation →

22

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. They frequently run queries that aggregate sales by product and region over the past month. The queries are slow because they scan the entire table. Which index type should they implement on the fact table to improve query performance for these aggregations?

A.Clustered columnstore index

B.Clustered index on the primary key

C.Hash-distributed table on SalesID

D.Non-clustered index on (ProductID, Region)

AnswerA

Columnstore indexes are optimized for large scans and aggregations, typical in data warehouse workloads.

Why this answer

A clustered columnstore index is ideal for large fact tables in Azure Synapse Analytics dedicated SQL pool because it stores data column-wise, enabling high compression and eliminating the need to scan irrelevant columns. For aggregation queries that sum sales by product and region over the past month, the columnstore index significantly reduces I/O by reading only the necessary columns and applying batch-mode processing, which accelerates scan and aggregation operations.

Exam trap

The trap here is that candidates confuse indexing strategies for transactional OLTP workloads (where rowstore indexes like clustered or non-clustered are optimal) with analytical OLAP workloads, failing to recognize that columnstore indexes are specifically designed for large-scale aggregations and scans in dedicated SQL pools.

How to eliminate wrong answers

Option B is wrong because a clustered index on the primary key organizes data row-wise, which forces full table scans for aggregation queries and does not benefit from columnar compression or batch-mode processing. Option C is wrong because hash-distributing the table on SalesID improves data distribution and parallel processing but does not change the storage format; without a columnstore index, the table still scans all rows and columns for aggregations. Option D is wrong because a non-clustered index on (ProductID, Region) would require key lookups for additional columns and does not provide the columnar storage and compression benefits needed for efficient scan-heavy aggregation workloads.

Full explanation →

23

Matchingmedium

Match each Azure data tool to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Data integration and orchestration

Apache Spark-based analytics platform

Real-time stream processing

Distributed analytics (legacy)

Managed open-source analytics service

Why these pairings

These are key Azure data processing and analytics tools.

Full explanation →

24

MCQeasy

A social media application allows users to post updates and like posts. After a user clicks the like button, the like count must update immediately and be exactly the same for all users viewing the post. Which data consistency model best fits this requirement?

A.Eventual consistency

B.Strong consistency

C.Session consistency

D.Bounded staleness consistency

AnswerB

Strong consistency guarantees that after a write, all reads return the most recent write. This is required for the like count to be immediately accurate for all viewers.

Why this answer

Strong consistency ensures that after a write operation (like clicking the like button) completes, any subsequent read operation returns the most recent write. This guarantees that all users viewing the post see the exact same, up-to-date like count immediately. This is required for the social media scenario where the like count must be identical for all viewers without any delay.

Exam trap

Microsoft often tests the misconception that 'eventual consistency' is acceptable for real-time updates, but the key differentiator here is the requirement for immediate and identical visibility for all users, which only strong consistency satisfies.

How to eliminate wrong answers

Option A is wrong because eventual consistency allows replicas to temporarily diverge, meaning some users might see an outdated like count for a period of time, which violates the requirement for immediate and identical updates. Option C is wrong because session consistency only guarantees monotonic reads and writes within a single user session; it does not ensure that all users across different sessions see the same updated count immediately. Option D is wrong because bounded staleness consistency permits a configurable time window or version lag before updates are visible to all readers, which would not meet the requirement for an instant, identical view for all users.

Full explanation →

25

MCQeasy

Your company uses Azure Data Lake Storage Gen2 and wants to grant a data scientist read-only access to a specific container. Which built-in RBAC role should you assign?

A.Storage Account Contributor

B.Reader

C.Storage Blob Data Contributor

D.Storage Blob Data Reader

AnswerD

This role grants read-only access to blob containers and data.

Why this answer

Storage Blob Data Reader (D) is the correct built-in RBAC role because it grants read-only access to Azure Storage blob containers and data, including Data Lake Storage Gen2. This role provides the necessary permissions for a data scientist to read data from a specific container without allowing write or delete operations.

Exam trap

The trap here is that candidates often confuse the ARM-level Reader role (which only allows viewing storage account metadata) with data plane roles like Storage Blob Data Reader, mistakenly thinking Reader grants data access.

How to eliminate wrong answers

Option A is wrong because Storage Account Contributor grants full management access to the storage account, including the ability to change account configuration and delete the account, which far exceeds read-only container access. Option B is wrong because Reader provides read-only access to Azure resource management (ARM) plane operations, such as viewing storage account properties, but does not grant any permissions to read data within containers or blobs. Option C is wrong because Storage Blob Data Contributor allows read, write, and delete operations on blob containers and data, which is not read-only and would grant the data scientist excessive permissions.

Full explanation →

26

MCQeasy

A marketing team needs to analyze customer sentiment from social media posts in real time. The solution must ingest a stream of tweets, perform sentiment analysis using a pre-built AI model, and store the results in a dashboard for immediate visualization. The team has limited coding experience and prefers a low-code/no-code approach. Which combination of Azure services should you recommend?

A.Azure Event Hubs, Azure Functions, and Azure Cosmos DB

B.Azure IoT Hub, Azure Data Factory, and Power BI

C.Azure Event Hubs, Azure Stream Analytics, and Power BI

D.Azure Event Hubs, Azure HDInsight, and Power BI

AnswerC

Low-code real-time analytics with built-in sentiment analysis and Power BI dashboard.

Why this answer

Option C is correct because Azure Event Hubs ingests the real-time tweet stream, Azure Stream Analytics performs sentiment analysis using its built-in machine learning functions (a low-code/no-code approach), and Power BI provides the dashboard for immediate visualization. This combination meets the real-time, low-code requirement without custom coding.

Exam trap

The trap here is that candidates may choose Azure Functions (Option A) thinking it's serverless and low-code, but it actually requires writing code for sentiment analysis, whereas Azure Stream Analytics provides a true low-code/no-code solution with built-in ML capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Functions requires custom code to implement sentiment analysis, which violates the low-code/no-code preference, and Azure Cosmos DB is a NoSQL database not optimized for real-time dashboarding. Option B is wrong because Azure IoT Hub is designed for IoT device telemetry, not social media streams, and Azure Data Factory is a batch-oriented ETL service, not suitable for real-time stream processing. Option D is wrong because Azure HDInsight is a big data analytics service that requires coding (e.g., Spark, Hive) and is overkill for simple sentiment analysis, contradicting the low-code/no-code requirement.

Full explanation →

27

MCQmedium

A company uses Azure Synapse Analytics for its data warehouse. They notice that queries against a large fact table are slow. The table is partitioned by month and uses clustered columnstore index. Which action would most likely improve query performance?

A.Rebuild the clustered columnstore index.

B.Change partitioning to daily partitions.

C.Convert the table to a clustered rowstore index.

D.Increase the data warehouse performance level (DWU).

AnswerA

Rebuilding reorganizes data and improves compression and query performance.

Why this answer

Option C is correct: Rebuilding the columnstore index reorganizes data and improves compression and query performance, especially if there have been many updates/deletes. Option A is wrong because increasing DWU may help but is not a targeted fix. Option B is wrong because rowstore is slower for analytical queries.

Option D is wrong because partitioning by day may increase partition count and overhead.

Full explanation →

28

MCQmedium

A manufacturing company installs temperature sensors in a factory. Sensor data is streamed to Azure Event Hubs. The company needs to detect when the average temperature of any sensor exceeds 100°F over a 5-minute sliding window and then send an alert. Which Azure service should be used for this real-time stream processing?

A.Azure Stream Analytics

B.Azure Functions

C.Azure SQL Database

D.Azure Logic Apps

AnswerA

Correct. Stream Analytics can query streaming data from Event Hubs using SQL-like syntax with support for time windows (e.g., Tumbling, Hopping, Sliding) to compute averages and trigger alerts.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, including the ability to define a 5-minute sliding window over sensor data from Event Hubs, compute the average temperature per sensor, and trigger an alert when the threshold of 100°F is exceeded. It natively integrates with Event Hubs as an input and supports temporal window functions like TumblingWindow, HoppingWindow, and SlidingWindow for exactly this kind of time-based aggregation.

Exam trap

The trap here is that candidates often confuse Azure Functions with Stream Analytics because both can process Event Hubs data, but Functions lacks native windowing and stateful aggregation capabilities, making it unsuitable for sliding window calculations without significant custom code.

How to eliminate wrong answers

Option B (Azure Functions) is wrong because while it can process events from Event Hubs, it lacks native support for complex windowed aggregations like a 5-minute sliding window average; you would have to manually implement state management and windowing logic, which is inefficient and error-prone for real-time streaming. Option C (Azure SQL Database) is wrong because it is a relational database for storing and querying static data, not a real-time stream processing engine; it cannot natively consume Event Hubs streams or perform sliding window computations without additional ETL and custom code. Option D (Azure Logic Apps) is wrong because it is a workflow orchestration service for integrating applications and services, not a real-time analytics engine; it lacks the ability to perform continuous, low-latency stream processing with windowed aggregations over streaming data.

Full explanation →

29

MCQmedium

Refer to the exhibit. An Azure Data Factory pipeline JSON is shown. What does this pipeline do?

A.Copies sales data from Azure SQL Database to on-premises

B.Copies all sales data from on-premises to Azure SQL Database

C.Copies filtered sales data from on-premises to Azure SQL Database

D.Copies sales data from on-premises to Azure Data Lake Storage

AnswerC

Correctly describes the source, sink, and filter.

Why this answer

The pipeline uses a Copy activity with a source dataset pointing to an on-premises SQL Server (via a self-hosted integration runtime) and a sink dataset pointing to an Azure SQL Database. The source query includes a WHERE clause filtering sales data by a date range, so only filtered data is copied. This matches option C.

Exam trap

The trap here is that candidates often overlook the WHERE clause in the source query and assume the pipeline copies all data, or they confuse the direction of data movement between on-premises and Azure.

How to eliminate wrong answers

Option A is wrong because the source is on-premises SQL Server and the sink is Azure SQL Database, not the reverse. Option B is wrong because the source query includes a WHERE clause that filters the data, so it does not copy all sales data. Option D is wrong because the sink is Azure SQL Database, not Azure Data Lake Storage.

Full explanation →

30

MCQhard

A database administrator manages an Azure SQL Database with a table that has a clustered index on OrderID. Frequent queries filter on OrderDate and then sort the results by CustomerID. These queries perform poorly. Which indexing strategy will most improve performance for these specific queries?

A.Create a nonclustered index on (OrderDate, CustomerID)

B.Create a nonclustered index on (CustomerID, OrderDate)

C.Change the clustered index to (OrderDate)

D.Create a filtered index on OrderDate WHERE CustomerID IS NOT NULL

AnswerA

This covers both the filter (OrderDate) and the sort order (CustomerID), enabling index seek and avoiding a sort operation.

Why this answer

The query filters on OrderDate and then sorts by CustomerID. A nonclustered index on (OrderDate, CustomerID) supports both the WHERE clause (by OrderDate) and the ORDER BY clause (by CustomerID) as a covering index, allowing the database engine to perform a single index seek or scan without needing a separate sort operation. This directly addresses the performance bottleneck by eliminating the need to sort the results after filtering.

Exam trap

The trap here is that candidates often think a filtered index or changing the clustered index is the best solution, but they overlook that the query requires both filtering and sorting on different columns, and the leading key in a composite index must match the filter column to support both operations efficiently.

How to eliminate wrong answers

Option B is wrong because the index on (CustomerID, OrderDate) does not support the filter on OrderDate as the leading key; the query would require a full index scan or a separate sort, as the filter on OrderDate cannot use the index efficiently. Option C is wrong because changing the clustered index to OrderDate would reorder the entire table physically, which could improve range scans on OrderDate but would not directly optimize the sort by CustomerID; the query still needs to sort results by CustomerID, and the clustered index order does not help with that sort. Option D is wrong because a filtered index on OrderDate WHERE CustomerID IS NOT NULL does not include CustomerID as a key column, so it cannot help with the ORDER BY CustomerID clause; it also restricts the index to rows where CustomerID is not null, which may not cover all queries.

Full explanation →

31

MCQhard

An e-commerce company uses Azure Cosmos DB for its product catalog. They need to ensure that read requests are served from the nearest Azure region to reduce latency. Which feature should they use?

A.Azure Front Door

B.Cosmos DB multi-region writes

C.Azure Content Delivery Network

D.Microsoft Traffic Manager

AnswerB

Allows reads from the nearest region with automatic routing.

Why this answer

Cosmos DB multi-region writes (correctly referred to as multi-region reads in this context) allows you to configure your database account to be read from multiple Azure regions, enabling the SDK to automatically route read requests to the nearest region based on the client's location. This reduces latency by serving reads from a local replica without requiring a separate global load-balancing service.

Exam trap

The trap here is that candidates often confuse Azure Front Door or Traffic Manager as the solution for global read routing, but Cosmos DB's native multi-region read capability is the correct answer because it operates at the database SDK level with automatic region awareness and consistency support.

How to eliminate wrong answers

Option A is wrong because Azure Front Door is a global HTTP/HTTPS load balancer and application accelerator, not a database-level feature; it operates at the application layer and cannot directly serve Cosmos DB read requests from the nearest region without additional configuration. Option C is wrong because Azure Content Delivery Network (CDN) caches static content (e.g., images, videos) at edge nodes, not dynamic database queries; it cannot cache or serve Cosmos DB document reads. Option D is wrong because Microsoft Traffic Manager is a DNS-based traffic routing service that directs traffic at the domain level, but it does not integrate with Cosmos DB's SDK to provide automatic, region-aware read routing with session consistency.

Full explanation →

32

MCQeasy

A company must archive customer correspondence PDFs that are rarely accessed but must be retained for seven years. The documents must be available for read within seconds if requested. Which Azure Blob Storage access tier should be used to minimize storage cost while meeting the availability requirement?

A.Hot

B.Cool

C.Archive

D.Premium

AnswerB

Cool tier is for infrequent access with immediate availability and lower storage cost than Hot.

Why this answer

The Cool tier is optimal because it balances low storage cost with high availability for data that is infrequently accessed but must be retrievable within seconds. It offers the same low-latency retrieval as the Hot tier (milliseconds) but at a lower storage price, making it ideal for archived correspondence that still requires immediate read access.

Exam trap

The trap here is that candidates see 'archived' and immediately choose the Archive tier, forgetting the 'within seconds' availability requirement that disqualifies it.

How to eliminate wrong answers

Option A is wrong because the Hot tier has the highest storage cost and is designed for frequently accessed data, not for rarely accessed archives. Option C is wrong because the Archive tier has the lowest storage cost but retrieval times can range from minutes to hours, failing the 'within seconds' requirement. Option D is wrong because the Premium tier is optimized for high transaction volumes and low latency on block blobs, not for cost-effective archiving of rarely accessed data.

Full explanation →

33

MCQeasy

A company is designing a new application that will store customer orders in a relational database on Azure. The data includes order IDs, customer IDs, product IDs, quantities, and order dates. The application needs to support complex queries that join multiple tables and enforce referential integrity. Which Azure service should the company use?

A.Azure SQL Database

B.Azure Table Storage

C.Azure Cosmos DB

D.Azure Blob Storage

AnswerA

Azure SQL Database is a fully managed relational database with full SQL support, including joins and referential integrity constraints.

Why this answer

Option B is correct because Azure SQL Database is a managed relational database service that supports complex queries and enforces referential integrity. Option A (Azure Cosmos DB) is a NoSQL database. Option C (Azure Table Storage) is a key-value store.

Option D (Azure Blob Storage) is for unstructured data.

Full explanation →

34

Multi-Selectmedium

Which THREE components are part of an end-to-end analytics solution on Azure?

Select 3 answers

A.Data ingestion (Azure Data Factory)

B.Azure Virtual Machines

C.Data visualization (Power BI)

D.Data storage (Azure Data Lake Storage)

E.Microsoft Entra ID

AnswersA, C, D

Ingests data from various sources.

Why this answer

Data ingestion using Azure Data Factory (ADF) is correct because ADF is the primary cloud-based ETL/ELT service for orchestrating and automating data movement from various sources into Azure storage or compute. It supports over 90 built-in connectors, scheduled triggers, and data flows, making it essential for the 'ingest' phase of an analytics pipeline.

Exam trap

The trap here is that candidates often confuse supporting services (like identity management or generic compute) with the core analytics pipeline components, leading them to select Azure Virtual Machines or Microsoft Entra ID as part of the end-to-end solution when they are not directly involved in the data flow.

Full explanation →

35

Multi-Selecthard

A global e-commerce platform uses a distributed database for its shopping cart service. The platform must be highly available and continue to accept writes even if network partitions occur between data centers. The business accepts that during a partition, users might see slightly outdated inventory counts, but the service must remain operational. According to the CAP theorem, which two properties is this system prioritizing?

Select 2 answers

A.Consistency and Partition Tolerance

B.Availability and Partition Tolerance

C.Consistency and Availability

D.Durability and Partition Tolerance

AnswersB, C

The system is designed to remain available (accept writes) even during network partitions, sacrificing immediate consistency (stale data is acceptable). This is a classic 'AP' system under the CAP theorem.

Why this answer

The scenario describes a system that must remain operational and accept writes during network partitions, even if data becomes temporarily inconsistent (stale inventory counts). This prioritizes Availability (the service stays up and accepts writes) and Partition Tolerance (the system continues to function despite network splits). According to the CAP theorem, when a partition occurs, a distributed system must choose between Consistency and Availability; here, the business accepts eventual consistency, so Availability and Partition Tolerance are the chosen properties.

Exam trap

The trap here is that candidates often assume 'highly available' automatically means 'Consistency and Availability' (CA), forgetting that the CAP theorem states you cannot have all three during a partition, and the scenario explicitly allows stale data, which sacrifices Consistency for Availability.

Full explanation →

36

MCQhard

A manufacturing company ingests real-time sensor data from factory equipment via Azure Event Hubs. The data is a continuous stream of measurements (sensorId, timestamp, value). Additionally, historical maintenance records are stored as CSV files in Azure Data Lake Storage Gen2. The operations team needs to join the streaming data with the historical records in near real-time to detect anomalies. They also need to run complex T-SQL queries on the combined dataset for ad-hoc analysis. Which Azure service should they use as the primary analytics platform?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Data Factory

D.Azure HDInsight with Spark

AnswerB

Synapse Analytics unifies big data and data warehousing, enabling T-SQL queries on both streaming and batch data from Event Hubs and Data Lake Storage.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest real-time streaming data from Azure Event Hubs via its built-in pipeline or Spark Structured Streaming, and simultaneously query historical CSV files in Azure Data Lake Storage Gen2 using serverless SQL or dedicated SQL pools. This allows the operations team to join streaming and batch data in near real-time for anomaly detection, and also run complex T-SQL queries for ad-hoc analysis, all within a single service.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it handles streaming data, but they overlook the requirement for complex T-SQL ad-hoc queries, which Stream Analytics cannot support, while Azure Synapse Analytics provides both streaming ingestion and full T-SQL analytics in a single platform.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing and can join streaming data with reference data, but it cannot run complex T-SQL queries for ad-hoc analysis; it uses a SQL-like query language limited to streaming patterns and lacks full T-SQL support for interactive exploration. Option C is wrong because Azure Data Factory is an orchestration and data movement service, not an analytics platform; it cannot perform real-time stream joins or execute T-SQL queries directly. Option D is wrong because Azure HDInsight with Spark is a big data processing service that can handle streaming and batch workloads, but it does not natively support complex T-SQL queries; it requires additional tools like Spark SQL or Hive, and lacks the integrated SQL analytics capabilities of Synapse.

Full explanation →

37

Multi-Selectmedium

A company is designing a solution to store time-series data from millions of IoT devices. Which TWO Azure services are most suitable for this scenario?

Select 2 answers

A.Azure Data Explorer

B.Azure Blob Storage

C.Azure Cosmos DB

D.Azure Redis Cache

E.Azure SQL Database

AnswersA, C

Built for time-series analytics and high-throughput ingestion.

Why this answer

Azure Cosmos DB (option B) is suitable for time-series data with its flexible schema and low latency. Azure Data Explorer (option D) is optimized for time-series analytics. Option A (Azure SQL Database) is relational and less efficient for high-volume time-series.

Option C (Azure Blob Storage) is for unstructured data. Option E (Azure Redis Cache) is a cache, not primary storage.

Full explanation →

38

MCQeasy

A company needs to store JSON documents that are frequently updated by multiple services. The solution must support indexing and querying by any property. Which Azure data service should they use?

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Cosmos DB

D.Azure Table Storage

AnswerC

Azure Cosmos DB is a globally distributed NoSQL database that stores JSON documents, automatically indexes all properties, and supports SQL-like queries.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database designed for JSON documents, offering native support for indexing every property automatically without requiring a predefined schema. Its multi-model API (including SQL API) allows querying by any property with low-latency reads and writes, making it ideal for services that frequently update JSON documents.

Exam trap

The trap here is that candidates confuse Azure Blob Storage's ability to store JSON files (as blobs) with the ability to query them by property, overlooking the lack of native indexing and querying capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage stores unstructured binary or text data as blobs, not queryable JSON documents, and lacks native indexing or querying by arbitrary properties. Option B is wrong because Azure SQL Database is a relational database that requires a fixed schema and does not natively store or index JSON documents without manual schema design and JSON functions. Option D is wrong because Azure Table Storage is a key-attribute store that only supports queries on partition key and row key, not arbitrary property indexing, and is not optimized for JSON document storage.

Full explanation →

39

MCQmedium

A company has a table named 'Sales' in Azure SQL Database with columns: SaleID (int, primary key), ProductID (int), SaleDate (datetime), Quantity (int), UnitPrice (decimal), TotalAmount (computed column). Queries frequently run to retrieve the total Quantity and UnitPrice for a specific ProductID over a date range. The query filters on ProductID and SaleDate and selects only Quantity and UnitPrice. Which index would most improve query performance?

A.Nonclustered index on (ProductID, SaleDate) INCLUDE (Quantity, UnitPrice)

B.Nonclustered index on (SaleDate) INCLUDE (Quantity, UnitPrice)

C.Clustered index on (ProductID, SaleDate)

D.Nonclustered index on (ProductID) INCLUDE (Quantity, UnitPrice)

AnswerA

This covering index includes all columns needed by the query (Quantity, UnitPrice) as included columns, and the key columns (ProductID, SaleDate) support efficient filtering. The query can be satisfied entirely from the index without key lookups.

Why this answer

Option A is correct because it creates a covering nonclustered index that supports both the WHERE clause (ProductID, SaleDate) and the SELECT clause (Quantity, UnitPrice) without needing to access the base table. The index key order matches the query filter, and the included columns avoid key lookups, minimizing I/O for the frequent aggregation queries.

Exam trap

The trap here is that candidates often think a clustered index on the filter columns is always best, but they overlook that a nonclustered index with included columns can provide a covering index that avoids costly key lookups, especially when the SELECT list is a subset of columns.

How to eliminate wrong answers

Option B is wrong because indexing only on SaleDate does not support equality filtering on ProductID, leading to a scan of all rows for each ProductID within the date range. Option C is wrong because a clustered index on (ProductID, SaleDate) would physically reorder the table by those columns, but it does not include Quantity and UnitPrice as non-key columns, so queries would still need to read the full row; moreover, a clustered index on non-unique keys can cause fragmentation and overhead. Option D is wrong because indexing only on ProductID without SaleDate forces a scan of all rows for that ProductID to filter by date range, missing the range-based optimization that a composite key provides.

Full explanation →

40

MCQmedium

A company uses Azure SQL Database for an e-commerce application. The Orders table contains columns: OrderID (int, primary key), CustomerID (int), OrderDate (datetime), TotalAmount (decimal). Queries frequently filter by CustomerID and OrderDate to retrieve orders for a specific customer within a date range. Queries also need to retrieve a single order by OrderID quickly. Which indexing strategy will most improve the performance of these queries?

A.A) Keep the clustered index on OrderID and create a non-clustered index on (CustomerID, OrderDate)

B.B) Change the clustered index to (CustomerID, OrderDate) and create a non-clustered index on OrderID

C.C) Keep the clustered index on OrderID and create a non-clustered index on (OrderDate, CustomerID)

D.D) Keep the clustered index on OrderID and create two separate non-clustered indexes on CustomerID and OrderDate

AnswerA

The clustered index on OrderID ensures fast point lookups. The non-clustered index on (CustomerID, OrderDate) covers the range query, allowing SQL Server to perform an index seek without accessing the full table.

Why this answer

Option A is correct because the existing clustered index on OrderID efficiently supports the single-order lookup by primary key. Adding a non-clustered index on (CustomerID, OrderDate) provides a covering index for the range queries filtering by CustomerID and OrderDate, allowing SQL Server to perform an index seek and avoid key lookups, which significantly improves performance.

Exam trap

The trap here is that candidates often think separate indexes on each column are sufficient for composite queries, but they overlook that a single composite index with the correct column order (high selectivity first) avoids costly index intersection and provides direct seek capability.

How to eliminate wrong answers

Option B is wrong because changing the clustered index to (CustomerID, OrderDate) would make the table physically ordered by those columns, which slows down the frequent single-order lookup by OrderID (now a non-clustered key lookup) and increases index maintenance overhead. Option C is wrong because the non-clustered index on (OrderDate, CustomerID) is less efficient for queries filtering by CustomerID first, as SQL Server cannot seek on the leading column OrderDate without a CustomerID predicate, leading to scans or poor selectivity. Option D is wrong because two separate non-clustered indexes on CustomerID and OrderDate do not support combined range queries as effectively; SQL Server would need to use index intersection (hash match) or scan one index, which is less efficient than a single composite index that covers both columns.

Full explanation →

41

MCQeasy

A healthcare organization is planning a data analytics platform. They will ingest data from various sources: structured patient records from a relational database, semi-structured JSON logs from medical devices, and unstructured physician notes as plain text files. Which characteristic of big data describes the different formats of data being ingested?

A.Volume

B.Velocity

C.Variety

D.Veracity

AnswerC

Variety correctly describes the different data types (structured, semi-structured, unstructured) being ingested.

Why this answer

The question describes data in three distinct formats: structured (relational database), semi-structured (JSON logs), and unstructured (plain text). In big data terminology, 'Variety' specifically refers to the different types and formats of data being processed. This is a core concept in the 4 V's of big data, where Variety captures the heterogeneity of data sources and structures.

Exam trap

The trap here is that candidates often confuse 'Variety' with 'Volume' because they associate big data with large datasets, but the question explicitly asks about different formats, not size.

How to eliminate wrong answers

Option A (Volume) is wrong because Volume refers to the sheer quantity of data being generated, not the different formats. Option B (Velocity) is wrong because Velocity describes the speed at which data is generated and processed, such as real-time streaming from IoT devices. Option D (Veracity) is wrong because Veracity concerns the quality, accuracy, and trustworthiness of the data, not its format diversity.

Full explanation →

42

Matchingmedium

Match each Azure data service to its primary purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Relational database as a service

NoSQL multi-model database

Big data and analytics

Unstructured object storage

Scalable data lake for analytics

Why these pairings

These are core Azure data services with distinct use cases.

Full explanation →

43

MCQhard

An organization uses Azure Stream Analytics to process real-time IoT data from millions of devices. They need to ensure that the output is exactly once delivery semantics to a Power BI dataset. Which output configuration should they use?

A.Output to Azure Data Explorer and then use a Power BI dashboard.

B.Set the output to Power BI with 'At least once' delivery mode.

C.Output to Azure Event Hubs and then use Azure Functions to write to Power BI.

D.Configure the Power BI output with 'Exactly once' delivery mode.

AnswerD

Stream Analytics supports exactly once delivery to Power BI when configured.

Why this answer

Azure Stream Analytics supports exactly-once delivery semantics for Power BI output when the 'Exactly once' delivery mode is configured. This ensures that each event is processed and delivered precisely once, preventing duplicates in the Power BI dataset. The 'Exactly once' mode uses checkpointing and idempotent writes to guarantee consistency, which is critical for accurate real-time analytics from millions of IoT devices.

Exam trap

The trap here is that candidates may assume 'At least once' is the only option for Power BI output, or that an intermediate service like Event Hubs or Data Explorer inherently provides exactly-once guarantees, when in fact Stream Analytics directly supports 'Exactly once' delivery mode for Power BI.

How to eliminate wrong answers

Option A is wrong because routing through Azure Data Explorer introduces an intermediary that does not inherently provide exactly-once semantics to Power BI; it adds latency and complexity without solving the delivery guarantee. Option B is wrong because 'At least once' delivery mode allows duplicate events, which violates the requirement for exactly-once delivery. Option C is wrong because outputting to Event Hubs and then using Azure Functions does not guarantee exactly-once delivery to Power BI; Azure Functions may process events multiple times due to retries or lack of idempotency, and the architecture adds unnecessary overhead.

Full explanation →

44

MCQmedium

A company has a legacy application that requires SMB (Server Message Block) file shares to store and access configuration files. They want to migrate this data to Azure without modifying the application. Which Azure storage solution should they use?

A.Azure Blob Storage

B.Azure Files

C.Azure Queue Storage

D.Azure Disk Storage

AnswerB

Azure Files offers fully managed cloud file shares that support the SMB protocol. It can be mounted directly by on-premises or cloud applications, allowing the legacy application to access files without modification.

Why this answer

Azure Files provides fully managed SMB (Server Message Block) file shares in the cloud, supporting the SMB 3.0 protocol. This allows the legacy application to access configuration files over the network using standard file share paths without any code changes, making it the ideal migration target for lift-and-shift scenarios.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage (object storage) with file shares, assuming it can serve SMB traffic, but Blob Storage does not natively support the SMB protocol and requires application modifications or third-party tools to emulate file shares.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is an object store that uses REST APIs or SDKs, not SMB protocol, so the legacy application would require modification to use blob-based access. Option C is wrong because Azure Queue Storage is a messaging service for asynchronous communication between application components, not a file share for storing and accessing configuration files. Option D is wrong because Azure Disk Storage provides block-level storage volumes attached to virtual machines, but it does not expose an SMB file share endpoint; it requires mounting as a disk within a VM, which does not match the requirement of accessing configuration files via SMB shares.

Full explanation →

45

MCQmedium

A company uses Azure SQL Database for an order management system. The Orders table has columns: OrderID (int, primary key), CustomerID (int), OrderDate (datetime), Status (varchar), TotalAmount (decimal). Queries frequently filter on CustomerID and OrderDate to find orders from a specific customer within a date range. Which index would most improve performance for these queries?

A.A clustered index on OrderID

B.A non-clustered index on Status

C.A non-clustered index on (CustomerID, OrderDate) INCLUDE (TotalAmount)

D.A non-clustered index on (OrderDate, TotalAmount)

AnswerC

This composite index directly supports the filter predicate (CustomerID and OrderDate) and includes TotalAmount to make it a covering index, reducing I/O.

Why this answer

The query filters on CustomerID and OrderDate, so a composite non-clustered index on (CustomerID, OrderDate) allows SQL Server to perform an index seek on both columns, drastically reducing the number of rows scanned. Including TotalAmount as a non-key column makes this a covering index, meaning all needed data (including TotalAmount) is in the index leaf pages, avoiding costly key lookups to the clustered index.

Exam trap

The trap here is that candidates often pick an index starting with OrderDate (Option D) because they think date-range filtering is the primary need, forgetting that the equality filter on CustomerID must be the leading column for an efficient seek.

How to eliminate wrong answers

Option A is wrong because a clustered index on OrderID (the primary key) already exists by default, and it does not help queries filtering on CustomerID and OrderDate; it would force a full clustered index scan. Option B is wrong because an index on Status alone is not selective for date-range queries and does not support the equality filter on CustomerID or the range filter on OrderDate, so it would likely be ignored or cause a scan. Option D is wrong because the index order starts with OrderDate instead of CustomerID; since the query filters on CustomerID first (equality), leading with OrderDate prevents an efficient seek on CustomerID, and including TotalAmount as a key column is unnecessary overhead.

Full explanation →

46

Drag & Drophard

A retail company needs to build an analytics pipeline on Azure. They ingest sales data from multiple store systems and an online e-commerce platform. The data must be cleaned, transformed, and loaded into a data warehouse for reporting. The company wants to use a modern ELT (Extract, Load, Transform) approach where raw data is stored first and then transformed. Order the following steps in the correct sequence for this pipeline. (Drag the steps into the correct order.)

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Why this order

ELT approach first ingests raw data to storage, loads it into the warehouse, then transforms it before creating reporting structures.

Full explanation →

47

MCQhard

You are a data architect for a logistics company. The company uses Azure Data Lake Storage Gen2 to store shipment tracking data. The data is ingested from IoT devices on trucks. Each record contains truck ID, timestamp, GPS coordinates, speed, and fuel level. The volume is 5 TB per day. The company wants to build a near-real-time dashboard to monitor truck locations and speeds. They also need to run daily batch analytics to compute fuel efficiency trends. You need to design a solution that minimizes latency for the dashboard and maximizes cost efficiency for batch processing. You plan to use Azure Event Hubs for ingestion. Which approach should you take?

A.Use Azure Stream Analytics to process the stream and output directly to Azure SQL Database. Use Power BI to query SQL Database for both real-time dashboard and historical analytics.

B.Use Azure Event Hubs Capture to store data in Azure Blob Storage, then use Azure Data Factory to transform and load into Azure Synapse Analytics for both dashboard and batch.

C.Use Azure Databricks with Structured Streaming to process the stream, write to Delta Lake, and use Delta Lake to serve both real-time and batch queries.

D.Use Azure Stream Analytics to process the stream, output to Power BI for real-time dashboard, and simultaneously output raw data to Azure Data Lake Storage. Use Azure Databricks to process the data lake for batch analytics.

AnswerD

Stream Analytics provides low latency for dashboard; Data Lake Storage is cost-effective for large volumes; Databricks handles batch efficiently.

Why this answer

Option D is correct because it separates the real-time and batch processing paths to minimize latency and maximize cost efficiency. Azure Stream Analytics outputs directly to Power BI for near-real-time dashboard updates, while simultaneously writing raw data to Azure Data Lake Storage for cost-effective storage. Azure Databricks then processes the data lake for daily batch analytics, avoiding expensive real-time compute for historical analysis.

Exam trap

The trap here is that candidates often assume a single technology (like Databricks or Synapse) can handle both real-time and batch workloads equally well, but the DP-900 exam tests the understanding that separating the streaming path (Stream Analytics to Power BI) from the batch path (Data Lake to Databricks) optimizes for both latency and cost.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is not optimized for high-velocity streaming ingestion and would introduce latency for the dashboard, plus it is costly for storing 5 TB/day of raw data. Option B is wrong because Event Hubs Capture to Blob Storage and then Azure Data Factory to Synapse Analytics introduces batch processing latency that cannot meet near-real-time dashboard requirements. Option C is wrong because while Delta Lake can serve both real-time and batch queries, using Databricks Structured Streaming for the dashboard adds unnecessary complexity and cost compared to a dedicated stream processing service like Stream Analytics.

Full explanation →

48

MCQmedium

Refer to the exhibit. An administrator deploys this Azure Policy assignment. What is the most likely effect on storage account 'storage1'?

A.Public network access will be denied.

B.The storage account will be deleted.

C.Firewall rules will be added.

D.Soft Delete will be enabled.

AnswerA

The 'deny' effect denies non-compliant requests.

Why this answer

The Azure Policy assignment shown in the exhibit denies the creation or update of storage accounts that do not have public network access disabled. Since 'storage1' is subject to this policy, the policy will enforce the 'Deny' effect, preventing any configuration that allows public network access. If 'storage1' already exists and is compliant, it remains; if it is non-compliant, the policy will block changes that would enable public access, effectively denying public network access.

Exam trap

The trap here is that candidates confuse 'Deny' with 'DeployIfNotExists' or 'Modify' effects, assuming the policy will automatically change settings or delete resources, when in fact 'Deny' only blocks non-compliant requests.

How to eliminate wrong answers

Option B is wrong because Azure Policy with a 'Deny' effect does not delete resources; it only prevents non-compliant creation or updates. Option C is wrong because the policy specifically targets 'public network access' (a property of the storage account), not firewall rules—firewall rules are a separate configuration that can coexist with disabled public network access. Option D is wrong because the policy does not mention 'Soft Delete' or any blob-level data protection feature; it only evaluates the 'public network access' property.

Full explanation →

49

MCQmedium

A company is designing a data solution for their e-commerce platform. They need to store product catalogs with varying attributes, support high-throughput read/write operations, and ensure low-latency access globally. Which Azure data store is most appropriate?

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Redis Cache

D.Azure Data Lake Storage

AnswerA

Cosmos DB is a NoSQL database with automatic indexing, multi-master replication, and low-latency global access.

Why this answer

Azure Cosmos DB is the most appropriate choice because it is a globally distributed, multi-model database service that supports schema-agnostic storage of product catalogs with varying attributes. It offers guaranteed single-digit-millisecond latency for reads and writes at any scale, and its turnkey global distribution enables low-latency access from multiple regions, meeting the e-commerce platform's high-throughput and global requirements.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's JSON support with native schema flexibility, overlooking the fact that Cosmos DB is purpose-built for globally distributed, schema-agnostic workloads with guaranteed latency SLAs.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a relational database with a fixed schema, which is not suitable for storing product catalogs with varying attributes without complex schema changes or using JSON columns that lack native indexing and global distribution capabilities. Option C is wrong because Azure Redis Cache is an in-memory data store primarily used for caching and session state, not for durable, persistent storage of product catalogs with high-throughput writes and global replication. Option D is wrong because Azure Data Lake Storage is designed for big data analytics and batch processing of large volumes of unstructured data, not for low-latency, high-throughput transactional read/write operations required by an e-commerce product catalog.

Full explanation →

50

Multi-Selectmedium

Which TWO features are available in Azure SQL Database to improve performance without changing application code?

Select 2 answers

A.Automatic tuning

B.Columnstore indexes

C.Query Store

D.Read replicas

E.Elastic pools

AnswersA, C

Automatically creates/drops indexes and forces plan changes without code changes.

Why this answer

Options A and D are correct. Query Store tracks query performance and helps identify regressions. Automatic tuning uses the database engine to implement index and query plan changes.

Option B is wrong because read replicas require application changes to route read traffic. Option C is wrong because elastic pools are for managing multiple databases, not performance tuning. Option E is wrong because columnstore indexes require table schema changes.

Full explanation →

51

MCQeasy

A transportation company collects real-time GPS data from thousands of delivery vehicles. They need to process this streaming data to detect delays and generate alerts when a vehicle is behind schedule. Which Azure service should they use for the stream processing?

A.A. Azure Data Factory

B.B. Azure Stream Analytics

C.C. Azure Data Lake Analytics

D.D. Azure Analysis Services

AnswerB

Azure Stream Analytics is a fully managed service for real-time stream processing that can process millions of events per second and trigger alerts based on conditions.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed to handle high-velocity data from sources like IoT devices and GPS sensors. It can ingest streaming data from Azure Event Hubs or IoT Hub, apply SQL-based queries to detect patterns such as delays, and output alerts to sinks like Azure Functions or Power BI in near real-time.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's batch orchestration capabilities with real-time processing, or mistakenly think Azure Data Lake Analytics can handle streaming data because of its 'analytics' name, but neither supports continuous, low-latency stream processing.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data integration service for orchestrating batch data movement and transformation, not designed for real-time stream processing. Option C is wrong because Azure Data Lake Analytics is a batch analytics service that processes large volumes of data stored in Data Lake Storage using U-SQL, not suited for low-latency streaming scenarios. Option D is wrong because Azure Analysis Services is an analytical engine for semantic modeling and business intelligence on pre-processed data, not capable of ingesting or processing real-time streaming data.

Full explanation →

52

MCQhard

A social media application stores user posts in Azure Cosmos DB. Each document contains fields: PostID (unique), UserID, Timestamp, Content, LikesCount. The most common query retrieves all posts by a specific UserID ordered by Timestamp descending. Which partition key and indexing strategy minimizes Request Unit (RU) consumption?

A.Partition key: PostID; Index: range on Timestamp

B.Partition key: UserID; Index: range on Timestamp

C.Partition key: Timestamp; Index: range on UserID

D.Partition key: UserID; Index: composite on (UserID, PostID)

AnswerB

Correct - UserID as partition key keeps each user's posts together. A range index on Timestamp enables efficient in-partition sorting, resulting in low RU.

Why this answer

Option B is correct because the query filters on UserID, so setting UserID as the partition key ensures all posts for a user are in the same physical partition, avoiding cross-partition queries. Adding a range index on Timestamp allows efficient sorting without additional RU overhead, as Cosmos DB can use the index to return results in descending order directly.

Exam trap

The trap here is that candidates often choose a composite index (Option D) thinking it optimizes both filter and sort, but Cosmos DB's indexing engine can satisfy the ORDER BY with a simple range index on the sort column alone, and a composite index would only add unnecessary write RU cost.

How to eliminate wrong answers

Option A is wrong because PostID as partition key would scatter each user's posts across multiple partitions, forcing a fan-out query that scans all partitions and consumes more RUs. Option C is wrong because Timestamp as partition key would also scatter posts for the same user across partitions, and the range index on UserID does not help sort by Timestamp efficiently. Option D is wrong because while UserID partition key is correct, a composite index on (UserID, PostID) is unnecessary and adds write overhead; a simple range index on Timestamp is sufficient for the ORDER BY clause.

Full explanation →

53

MCQmedium

A business analyst needs to query a large Azure SQL Database table that stores sales transactions. The table contains over 100 million rows. The analyst wants to retrieve aggregated sales per product category for the current month. The current query performs a full table scan and takes several minutes. Which indexing strategy will best improve the performance of this aggregation query?

A.Create a clustered index on the transaction date column

B.Create a nonclustered index on the product category column

C.Create a columnstore index on the table

D.Create a filtered index on transactions from the current month

AnswerC

A columnstore index is purpose-built for analytical queries that aggregate over large tables, using column-wise storage and advanced compression to reduce I/O.

Why this answer

A columnstore index stores data column-wise and uses batch processing, which dramatically accelerates aggregation queries (like SUM, COUNT, GROUP BY) over large tables. For a 100-million-row table, this reduces I/O and CPU by reading only the columns needed for the aggregation, making it the optimal choice for the analyst's current-month sales-per-category query.

Exam trap

The trap here is that candidates often choose a filtered or nonclustered index thinking they will reduce the scan scope, but they overlook that columnstore indexes are specifically designed for high-performance analytical aggregations on large tables, not just for filtering or single-column lookups.

How to eliminate wrong answers

Option A is wrong because a clustered index on transaction date would only speed up range scans or point lookups on that column, not aggregations by product category; the query would still need to scan all rows or perform a costly key lookup. Option B is wrong because a nonclustered index on product category would help with equality or range searches on that column, but for a full-table aggregation with GROUP BY, it would still require a full index scan and does not provide the columnar compression and batch processing benefits needed. Option D is wrong because a filtered index on transactions from the current month would only cover a subset of rows, but the query already filters to the current month; the real performance bottleneck is the aggregation over millions of rows, which a filtered index does not address as effectively as a columnstore index.

Full explanation →

54

MCQmedium

Refer to the exhibit. An Azure Policy is defined as shown. A database administrator attempts to create an Azure SQL Database without enabling zone redundancy. What will happen?

A.The database will be created and an audit event will be logged

B.The database creation will be denied

C.The database will be created and zone redundancy will be automatically enabled

D.The database will be created with zone redundancy disabled

AnswerB

The policy rule denies creation if zoneRedundant is false.

Why this answer

Option C is correct because the policy denies creation if zone redundancy is not enabled. Option A is wrong because the policy will deny, not allow. Option B is wrong because the effect is deny, not audit.

Option D is wrong because the policy applies at creation time, not later.

Full explanation →

55

MCQmedium

A bank processes a fund transfer transaction. The system debits $100 from Account A and then credits $100 to Account B. If the system crashes after debiting Account A but before crediting Account B, the database automatically reverts the debit. Which ACID property ensures this behavior?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Correct - Atomicity guarantees that the transaction is all-or-nothing. The rollback of the debit upon crash is a direct result of atomicity enforcement.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If any part of the transaction fails (e.g., a crash after debiting Account A but before crediting Account B), the entire transaction is rolled back, reverting any partial changes like the debit. This all-or-nothing behavior is the core of atomicity in database systems.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, thinking that 'keeping the database in a valid state' is what triggers the rollback, but it is actually atomicity that enforces the all-or-nothing rule for the transaction itself.

How to eliminate wrong answers

Option B is wrong because Consistency ensures that a transaction transforms the database from one valid state to another, enforcing integrity constraints (e.g., total balance remains constant), but it does not handle rollback of partial changes after a crash. Option C is wrong because Isolation ensures that concurrent transactions do not interfere with each other (e.g., via locking or MVCC), but it does not address crash recovery or rollback of incomplete transactions. Option D is wrong because Durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via write-ahead logging), but it does not revert uncommitted changes; that is the role of atomicity.

Full explanation →

56

Multi-Selecthard

Which TWO Azure services are primarily used for batch processing of large volumes of data? (Choose two.)

Select 2 answers

A.Azure Synapse Analytics

B.Azure SQL Database

C.Azure Stream Analytics

D.Azure Databricks

E.Azure Data Lake Storage

AnswersA, D

Synapse provides SQL and Spark engines for batch processing.

Why this answer

Azure Synapse Analytics is correct because it provides a cloud-based data warehousing and analytics service that uses massively parallel processing (MPP) to run complex queries and batch processing jobs over large datasets, often using PolyBase or T-SQL to transform and load data in bulk. Azure Databricks is correct because it is an Apache Spark-based analytics platform optimized for batch processing, allowing users to run distributed data processing jobs (e.g., ETL, data transformation) across large volumes of data using DataFrames and RDDs in a cluster environment.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage (a storage service) with a processing service, or mistakenly think Azure SQL Database can handle large-scale batch processing due to its ability to run bulk insert operations, but it lacks the distributed compute and parallel architecture required for true batch processing at scale.

Full explanation →

57

MCQeasy

A business analyst needs to create interactive visualizations and share dashboards with colleagues using data stored in an Azure Synapse Analytics dedicated SQL pool. Which tool should the analyst use?

A.Azure Data Studio

B.Power BI Desktop

C.SQL Server Management Studio (SSMS)

D.Azure Machine Learning Studio

AnswerB

Power BI Desktop enables connecting to Synapse, building interactive dashboards, and publishing to the Power BI service for sharing.

Why this answer

Power BI Desktop is the correct tool because it is designed for creating interactive visualizations and dashboards, and it can connect directly to Azure Synapse Analytics dedicated SQL pools via the built-in Azure Synapse Analytics connector. This allows the business analyst to build reports and share them with colleagues through the Power BI service, meeting the requirement for interactive dashboards and collaboration.

Exam trap

The trap here is that candidates often confuse Azure Data Studio or SSMS as tools for visualization because they can run queries and view results, but they lack the interactive dashboard and sharing features that Power BI provides.

How to eliminate wrong answers

Option A is wrong because Azure Data Studio is a database management and query tool focused on SQL development and administration, not on creating interactive visualizations or sharing dashboards. Option C is wrong because SQL Server Management Studio (SSMS) is primarily for managing SQL Server and Azure SQL databases, including querying and administration, but it lacks the visualization and dashboard-sharing capabilities needed for business analytics. Option D is wrong because Azure Machine Learning Studio is a platform for building, training, and deploying machine learning models, not for creating interactive visualizations or dashboards from data in a dedicated SQL pool.

Full explanation →

58

MCQmedium

A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?

A.External tables

B.Materialized views

C.Stored procedures

D.Indexed views

AnswerA

Correct. External tables use PolyBase to query external data sources like Azure Data Lake Storage using T-SQL without loading data into Synapse.

Why this answer

External tables in Azure Synapse Analytics allow you to query data stored in files (such as CSV in Azure Data Lake Storage Gen2) using T-SQL without loading the data into the database. They use the PolyBase engine to read the files directly, enabling on-the-fly querying of external data sources.

Exam trap

The trap here is that candidates may confuse external tables with materialized views or indexed views, thinking any database object can query external files, but only external tables provide the PolyBase-based bridge to read data directly from storage without loading.

How to eliminate wrong answers

Option B is wrong because materialized views store pre-computed data physically in the database, requiring data to be loaded first, which contradicts the requirement to query without loading. Option C is wrong because stored procedures are a set of T-SQL statements executed on the database, but they do not provide a mechanism to directly query external files without loading data into tables. Option D is wrong because indexed views require data to be stored in the database and indexed, which again necessitates loading the data first.

Full explanation →

59

MCQmedium

An e-commerce company has 20 SQL Server databases that each range from 10 GB to 50 GB and experience unpredictable usage patterns with occasional spikes in user activity. The company wants to migrate to Azure SQL Database to reduce management overhead and minimize costs by allowing databases to share resources. Which Azure SQL Database deployment option should they choose?

A.A: Single database with provisioned throughput

B.B: Elastic pool

C.C: Managed Instance

D.D: SQL Server on Azure VM

AnswerB

Elastic pools allow multiple databases to share resources, offering cost efficiency and handling unpredictable usage spikes without over-provisioning each database.

Why this answer

Elastic pools allow multiple SQL databases to share a fixed set of resources (eDTUs or vCores), which is ideal for databases with unpredictable usage patterns and occasional spikes. By pooling resources, the company can reduce management overhead and minimize costs compared to provisioning each database individually, as the pool's total resources are shared among all databases, smoothing out peak demands.

Exam trap

The trap here is that candidates may confuse 'reducing management overhead' with choosing a fully managed option like Managed Instance, but fail to recognize that the key requirement is 'sharing resources to minimize costs,' which is uniquely addressed by elastic pools, not by single databases or instance-level offerings.

How to eliminate wrong answers

Option A is wrong because a single database with provisioned throughput allocates dedicated resources per database, which would be cost-inefficient for 20 databases with sporadic spikes, as each would need to be sized for its peak load. Option C is wrong because Managed Instance is designed for lift-and-shift migrations requiring full SQL Server instance-level features and network isolation, not for sharing resources across multiple small databases to minimize cost. Option D is wrong because SQL Server on Azure VM requires ongoing management of the VM and SQL Server, contradicting the goal to reduce management overhead, and does not natively support resource sharing across databases without additional clustering or licensing costs.

Full explanation →

60

MCQeasy

A small online retailer wants to migrate its single on-premises SQL Server database to Azure. They require a fully managed relational database service with built-in high availability, automated backups, and no need to manage virtual machines. They do not need features like multiple databases with cross-database queries or SQL Agent. Which Azure service should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure Database for MySQL

AnswerA

Correct. Azure SQL Database is a PaaS offering with built-in high availability and automated backups, requiring no VM management. It is ideal for a single database migration.

Why this answer

Azure SQL Database is a fully managed Platform-as-a-Service (PaaS) relational database that provides built-in high availability (99.99% SLA with zone-redundant configuration), automated backups with point-in-time restore, and eliminates the need to manage virtual machines or operating system patches. It is the ideal choice for a single database migration when features like cross-database queries and SQL Agent are not required.

Exam trap

The trap here is that candidates often choose Azure SQL Managed Instance because it offers more SQL Server parity, but the question explicitly states the retailer does not need SQL Agent or cross-database queries, making the simpler and more cost-effective Azure SQL Database the correct choice.

How to eliminate wrong answers

Option B (Azure SQL Managed Instance) is wrong because it includes SQL Agent and cross-database query capabilities, which the retailer explicitly does not need, and it still requires managing a virtual network and instance-level configuration, adding unnecessary complexity. Option C (SQL Server on Azure Virtual Machines) is wrong because it is an Infrastructure-as-a-Service (IaaS) offering that requires the customer to manage the VM, apply OS and SQL Server patches, and configure high availability manually, contradicting the requirement for a fully managed service with no VM management. Option D (Azure Database for MySQL) is wrong because it is a different database engine (MySQL) and not a direct migration path for an existing SQL Server database; it would require schema and query changes, and it does not support SQL Server-specific features like T-SQL or CLR integration.

Full explanation →

61

MCQeasy

A retail company operates an e-commerce website that processes customer orders (insert, update, delete) throughout the day. The same company also runs reports on sales trends at the end of each quarter. Which type of data processing workload does the order processing represent?

A.A) Batch processing

B.B) Transactional processing (OLTP)

C.C) Analytical processing (OLAP)

D.D) Stream processing

AnswerB

Correct. Order processing requires real-time handling of individual inserts, updates, and deletes, which is the definition of OLTP. OLTP systems are designed for high concurrency and low latency for transactional operations.

Why this answer

Order processing involves inserting, updating, and deleting individual customer orders in real time as they occur. This is the classic definition of an Online Transaction Processing (OLTP) workload, which is optimized for high-volume, low-latency transactions that maintain ACID (Atomicity, Consistency, Isolation, Durability) properties. The e-commerce website requires immediate data consistency for each order, which is the hallmark of transactional processing.

Exam trap

The trap here is that candidates confuse 'analytical processing' (OLAP) with 'transactional processing' (OLTP) because both involve databases, but OLAP is for read-heavy, aggregated queries on historical data, not for the write-heavy, individual row operations of order management.

How to eliminate wrong answers

Option A is wrong because batch processing handles large volumes of data in scheduled, non-real-time batches (e.g., end-of-day payroll runs), not the continuous, individual order operations described. Option C is wrong because analytical processing (OLAP) is designed for complex queries and aggregations over historical data (e.g., sales trend reports), not for the high-frequency inserts/updates/deletes of live orders. Option D is wrong because stream processing deals with continuous, unbounded data flows (e.g., real-time sensor data or clickstreams) using event-time windows, not the discrete, stateful transactions of an order system.

Full explanation →

62

MCQhard

A marketing company stores years of historical campaign data in Azure Data Lake Storage Gen2 as Parquet files. Data analysts need to run complex SQL queries over this data to identify trends, and they want to visualize results in Power BI dashboards. The company wants to avoid moving data into a separate database to minimize duplication and latency. Which Azure service should they use to query the data directly in the data lake?

A.Azure Data Factory

B.Azure Synapse Serverless SQL pool

C.Azure Databricks

D.Azure HDInsight

AnswerB

Serverless SQL pool (part of Azure Synapse Analytics) enables querying data directly from Azure Data Lake Storage using standard T-SQL. It is serverless (no infrastructure to manage) and perfect for ad-hoc analytics integration with Power BI.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows you to run T-SQL queries directly over Parquet files in Azure Data Lake Storage Gen2 without moving or copying the data. It uses a pay-per-query model and supports standard SQL syntax, making it ideal for analysts who need to query historical campaign data and visualize results in Power BI with minimal latency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (an ETL tool) with a query service, or they assume Azure Databricks is the only option for big data SQL queries, overlooking the serverless SQL pool's ability to query data in place without cluster management.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data orchestration service, not a query engine; it cannot run complex SQL queries directly against data in the lake. Option C is wrong because Azure Databricks is a Spark-based analytics platform that requires cluster management and coding (Python, Scala, SQL) rather than providing a serverless SQL endpoint for ad-hoc queries. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service that requires provisioning and ongoing management, and it does not offer a serverless SQL interface for direct querying of Parquet files in the data lake.

Full explanation →

63

MCQeasy

A manufacturing company stores two types of data: (1) real-time sensor readings from production machines used to monitor current machine status, and (2) historical daily production summaries used by managers to identify trends over months. Which statement accurately describes these workloads?

A.Sensor readings are an OLAP workload; daily summaries are an OLTP workload.

B.Sensor readings are an OLTP workload; daily summaries are an OLAP workload.

C.Sensor readings are a NoSQL workload; daily summaries are a relational workload.

D.Sensor readings are a batch workload; daily summaries are a real-time workload.

AnswerB

Sensor readings are frequent transactions (OLTP), daily summaries are analytical (OLAP).

Why this answer

Option B is correct because real-time sensor readings involve frequent, small inserts and point lookups (typical of an OLTP workload), while historical daily summaries are aggregated data used for trend analysis over months (typical of an OLAP workload). OLTP systems handle high-volume transactional operations, whereas OLAP systems support complex queries and aggregations on large historical datasets.

Exam trap

The trap here is that candidates confuse OLTP with real-time and OLAP with batch, but OLTP can be real-time (e.g., sensor inserts) and OLAP can be batch (e.g., daily summaries), so the key distinction is transactional vs. analytical processing, not timing.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: sensor readings are an OLTP workload (not OLAP), and daily summaries are an OLAP workload (not OLTP). Option C is wrong because the workload type (OLTP vs. OLAP) is independent of the data model (NoSQL vs. relational); sensor readings could be stored in a relational or NoSQL database, and daily summaries could also be in either.

Option D is wrong because sensor readings are a real-time (streaming) workload, not batch; daily summaries are a batch workload (processed from historical data), not real-time.

Full explanation →

64

MCQmedium

A media company stores user profiles in Azure Cosmos DB using the Core (SQL) API. Each profile document contains a userId (unique), name, email, and a subscriptions array containing objects with a serviceName and startDate. The application needs to efficiently retrieve a single user by userId and also run a query to find all users who have a subscription to the service 'PremiumVideo'. Which partition key design is most appropriate for this workload?

A.Partition key on email

B.Partition key on userId

C.Partition key on serviceName (extracted from subscriptions array)

D.Partition key on a composite key combining userId and serviceName

AnswerB

Correct. Partitioning by userId ensures even distribution and efficient point reads. The subscription query can still run as a cross-partition query, which is acceptable for this workload.

Why this answer

Option B is correct because partitioning on userId ensures each document is evenly distributed across physical partitions, as userId is unique and used for point reads (the most efficient operation in Cosmos DB). The query for users with a 'PremiumVideo' subscription will be a cross-partition query regardless of partition key choice, but the primary workload—retrieving a single user by userId—is optimized with this design. Partitioning on userId also avoids hot partitions and adheres to the best practice of using a high-cardinality, frequently queried field as the partition key.

Exam trap

The trap here is that candidates assume partitioning on a frequently queried field like serviceName will optimize the subscription query, but they overlook that Cosmos DB requires the partition key to be a top-level property with high cardinality, and that point reads (by userId) are the most common and cost-sensitive operation in this workload.

How to eliminate wrong answers

Option A is wrong because email is not used in the primary point-read query (by userId), and partitioning on email would scatter documents across partitions for userId-based lookups, requiring inefficient cross-partition queries. Option C is wrong because serviceName is an array element, not a top-level property; Cosmos DB cannot use an array element as a partition key directly, and even if extracted, it would create a low-cardinality key (few distinct values like 'PremiumVideo'), leading to hot partitions and throttling. Option D is wrong because a composite key combining userId and serviceName is unnecessary and adds complexity; it does not improve point-read efficiency for userId alone (the primary workload) and would still require cross-partition queries for the subscription filter.

Full explanation →

65

MCQeasy

A company wants to store JSON documents from IoT devices with low latency and high availability. Which Azure data store should they use?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure SQL Database

AnswerB

Azure Cosmos DB is a globally distributed NoSQL database that natively supports JSON documents.

Why this answer

Azure Cosmos DB is the correct choice because it is a fully managed NoSQL database designed for low-latency, high-availability workloads, with native support for JSON documents. It offers single-digit millisecond read/write latencies at the 99th percentile, global distribution with multi-region writes, and multiple consistency models, making it ideal for IoT scenarios that require fast, always-on access to semi-structured data.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files with the need for a database that can natively query and index JSON documents, leading them to choose Blob Storage for its low cost rather than Cosmos DB for its low-latency querying capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is an object store for unstructured binary data (blobs) and does not provide native JSON document querying or indexing; it would require additional compute to parse and query JSON files. Option C is wrong because Azure Table Storage is a key-value store that does not natively support JSON documents; it stores entities as rows with a fixed schema and lacks the rich querying and indexing capabilities of a document database. Option D is wrong because Azure SQL Database is a relational database that requires a predefined schema and is not optimized for storing and querying flexible JSON documents with the same low-latency, high-throughput characteristics as Cosmos DB.

Full explanation →

66

MCQhard

A financial application uses Azure SQL Database. The workload consists of a high volume of small, frequent insert operations (OLTP) and periodic complex analytical queries that scan large portions of the same table (OLAP). The table currently has a clustered columnstore index. The inserts are suffering from performance degradation. What should the company do to improve insert performance while still enabling efficient analytical queries?

A.Replace the clustered columnstore index with a clustered rowstore index and add a nonclustered columnstore index

B.Use memory-optimized tables for the entire table

C.Partition the table by date and move older partitions to columnstore

D.Keep the clustered columnstore index and use batch inserts

AnswerA

Correct. The rowstore index accelerates inserts, and the nonclustered columnstore index enables fast analytical queries.

Why this answer

Option A is correct because a clustered rowstore index is optimized for high-volume OLTP inserts, while adding a nonclustered columnstore index allows the same table to support efficient analytical queries by providing a separate columnar structure. This hybrid approach avoids the insert overhead of columnstore indexes, which are designed for bulk operations and can suffer from small, frequent insert performance degradation due to delta store management and tuple mover processes.

Exam trap

The trap here is that candidates assume columnstore indexes are always the best choice for mixed workloads, overlooking the fact that they are optimized for batch operations and can severely degrade under high-frequency singleton inserts, making a hybrid rowstore/columnstore approach the correct solution.

How to eliminate wrong answers

Option B is wrong because memory-optimized tables are designed for extremely low-latency OLTP workloads, but they do not natively support columnstore indexes for OLAP queries, and converting the entire table would require application changes and may not improve analytical query performance. Option C is wrong because partitioning by date and moving older partitions to columnstore still leaves the active partition with a columnstore index, which does not resolve the insert performance issue for the high-volume, small inserts hitting that partition. Option D is wrong because keeping the clustered columnstore index and using batch inserts only mitigates the problem partially; columnstore indexes are inherently inefficient for small, frequent singleton inserts due to delta store fragmentation and the overhead of compressing small row groups, so the degradation persists.

Full explanation →

67

MCQmedium

A mobile gaming company stores player session data as key-value pairs. Each player has a unique PlayerID, and the application needs to read/write the player's current level and score with very low latency. The data does not require complex queries, and the schema (attributes per player) can vary. The company wants a fully managed, globally distributed NoSQL database. Which Azure data store should they choose?

A.Azure Cosmos DB Table API

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cache for Redis

AnswerA

The Table API is a key-value store that provides low-latency access, global distribution, and a flexible schema. It is designed for simple lookups by partition key and row key, fitting this use case perfectly.

Why this answer

Azure Cosmos DB Table API is the correct choice because it provides a fully managed, globally distributed NoSQL database that supports key-value data with schema flexibility. It offers low-latency reads and writes (single-digit milliseconds at the 99th percentile) and automatic global distribution, making it ideal for storing player session data with varying attributes per player.

Exam trap

The trap here is that candidates may confuse Azure Cache for Redis as a durable database, but it is primarily an in-memory cache that requires additional configuration for persistence and global distribution, whereas Cosmos DB Table API is a fully managed, globally distributed NoSQL database with built-in durability and low latency.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it is a relational database requiring a fixed schema and complex query capabilities, which contradicts the requirement for schema flexibility and key-value simplicity. Option C (Azure Blob Storage) is wrong because it is an object storage service for unstructured blobs (files, images, videos), not a low-latency key-value store for small data items like player level and score. Option D (Azure Cache for Redis) is wrong because it is an in-memory caching service, not a fully managed, globally distributed durable database; it would require additional persistence and replication setup to meet the durability and global distribution needs.

Full explanation →

68

MCQhard

A large e-commerce company needs to build an analytics solution. They have streaming clickstream data from their website (JSON) and daily sales data from their transactional database (CSV). They need to perform real-time dashboards on clickstream for the current hour, and also run complex historical queries that join sales data with aggregated clickstream data over the past year. They want a single Azure service that can handle both stream processing and batch processing using a unified experience, without moving data between separate systems. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Analytics

D.Azure HDInsight

AnswerC

Synapse Analytics provides a unified analytics experience with support for both real-time stream processing (via Synapse Pipelines and Spark structured streaming) and large-scale batch analytics using dedicated SQL pools or serverless SQL. It meets all requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified experience for both stream processing (via Synapse Pipelines or Spark Structured Streaming) and batch processing (via dedicated SQL pools or serverless SQL), enabling real-time dashboards on clickstream data and complex historical queries joining sales data with aggregated clickstream data without moving data between separate systems.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a pure stream processor) with a unified analytics service, overlooking that Synapse Analytics can handle both real-time and batch workloads in a single platform without requiring separate data movement or additional services.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service that cannot natively handle complex batch queries over historical data or join with CSV-based sales data without additional storage and orchestration. Option B is wrong because Azure Data Factory is an ETL and orchestration service that does not provide real-time stream processing or native query capabilities for dashboards. Option D is wrong because Azure HDInsight requires managing separate clusters for stream (e.g., Kafka/Spark Streaming) and batch (e.g., Hive/Spark) processing, and does not offer a unified single-service experience without moving data between systems.

Full explanation →

69

MCQmedium

A retail company collects streaming clickstream data from its website into Azure Event Hubs. They need to aggregate the data in real-time to count page views per product every minute and store the results in Azure SQL Database for a live dashboard. Which Azure service should they use to perform this real-time aggregation?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Pipelines

D.Azure Databricks

AnswerA

Correct. Stream Analytics is the simplest and most cost-effective service for real-time stream processing with built-in windowing and output to SQL Database.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, allowing you to define a query that aggregates clickstream data from Event Hubs using a tumbling window of one minute to count page views per product. The result can be written directly to Azure SQL Database, enabling a live dashboard without additional orchestration.

Exam trap

The trap here is that candidates may confuse Azure Stream Analytics with Azure Data Factory or Synapse Pipelines because both can move data, but only Stream Analytics provides native, low-latency stream processing with temporal windowing for real-time aggregation.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a data integration and orchestration service for batch and scheduled data movement, not a real-time stream processing engine. Option C (Azure Synapse Pipelines) is wrong because it is essentially the same as Data Factory within Synapse, designed for batch ETL and orchestration, not for continuous, low-latency aggregation of streaming data. Option D (Azure Databricks) is wrong because while it can process streaming data via Structured Streaming, it requires a cluster to be running and is overkill for a simple per-minute aggregation; it is not the simplest or most cost-effective service for this specific real-time aggregation task.

Full explanation →

70

MCQmedium

A smart building monitoring company ingests real-time sensor data (temperature, humidity, occupancy) from thousands of IoT devices into Azure Event Hubs. The company also stores historical building blueprints and maintenance records as CSV files in Azure Data Lake Storage Gen2. The engineering team needs to build a dashboard that displays live sensor readings overlaid on building floor plans, and also allows facility managers to run ad-hoc T-SQL queries that combine live sensor data with historical maintenance records. Which Azure service should they use as the primary analytics platform to meet both requirements?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Databricks

D.Azure Analysis Services

AnswerB

Azure Synapse Analytics unifies big data and data warehousing. It can ingest streaming data via Event Hubs, query both streaming and batch data using T-SQL across the data lake, and support dashboarding, making it the best fit for both real-time and ad-hoc requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that combines big data and data warehousing. It can ingest real-time streaming data from Azure Event Hubs via its built-in Spark pools or pipelines, and also run ad-hoc T-SQL queries against both the live sensor data (staged in tables) and historical CSV files stored in Azure Data Lake Storage Gen2 using serverless SQL pools. This meets both the real-time dashboard and ad-hoc T-SQL query requirements.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because they focus only on the real-time dashboard requirement, overlooking the need for ad-hoc T-SQL queries against historical data, which Stream Analytics cannot fulfill.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks like Power BI or storage, but it does not support ad-hoc T-SQL queries against historical data or provide a unified analytics platform for combining live and historical data. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that excels at batch and stream processing with Python, Scala, or SQL, but it does not natively support ad-hoc T-SQL queries (it uses Spark SQL, not T-SQL) and requires more manual setup for a traditional data warehouse experience. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and fast interactive queries on pre-aggregated data, but it cannot directly ingest real-time streaming data from Event Hubs or run ad-hoc T-SQL queries against raw CSV files in Data Lake Storage.

Full explanation →

71

MCQeasy

A company needs to analyze streaming data from IoT devices in real time. They want to identify anomalies and trigger alerts. Which Azure service should they use as the core processing engine?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Databricks

D.Azure Data Lake Storage

AnswerA

Stream Analytics is built for real-time data processing and alerting.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, allowing you to define SQL-like queries that run continuously against streaming data from sources like IoT Hub. It can detect anomalies and trigger alerts on the fly, making it the correct core processing engine for this IoT scenario.

Exam trap

The trap here is that candidates often confuse batch processing tools like Synapse Analytics or Databricks with real-time stream processing, overlooking that only Azure Stream Analytics is designed as a dedicated, low-latency stream processing engine for this exact pattern.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is a unified analytics platform designed for batch and interactive analytics on large-scale data, not for real-time stream processing with sub-second latency. Option C is wrong because Azure Databricks is a big data and AI platform that can process streaming data via Structured Streaming, but it is not the simplest or most direct service for real-time anomaly detection and alerting; it requires more complex setup and is not the core processing engine for this specific use case. Option D is wrong because Azure Data Lake Storage is a scalable data lake for storing raw data, not a processing engine; it cannot analyze streaming data or trigger alerts in real time.

Full explanation →

72

Multi-Selecthard

Which THREE components are required to implement a real-time analytics solution using Azure Stream Analytics? (Choose three.)

Select 3 answers

A.Stream input (e.g., Azure Event Hubs)

B.Batch input (e.g., Azure Blob Storage)

C.Machine learning model

D.Stream Analytics query

E.Output sink (e.g., Azure Synapse Analytics)

AnswersA, D, E

A stream input is the source of real-time data for Stream Analytics.

Why this answer

A stream input like Azure Event Hubs is required because Azure Stream Analytics processes data in real time from a streaming source. Event Hubs ingests millions of events per second, providing the low-latency, ordered event stream that Stream Analytics consumes via its input binding. Without a streaming source, the service cannot perform continuous, real-time analytics.

Exam trap

The trap here is that candidates often confuse optional components (like batch input for reference data or machine learning for advanced analytics) with mandatory ones, leading them to select B or C instead of recognizing that only stream input, the query, and an output sink form the core required triad.

Full explanation →

73

MCQmedium

A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?

A.Star schema

B.Snowflake schema

C.Medallion architecture

D.Lambda architecture

AnswerC

Correct. The Medallion architecture defines bronze, silver, and gold layers to progressively improve data quality and make data ready for analytics. It is a key concept in modern lakehouse architectures.

Why this answer

The medallion architecture (bronze, silver, gold) is a layered data lake design pattern where raw data lands in the bronze zone, is cleaned and deduplicated in the silver zone, and aggregated into business-ready datasets in the gold zone. This directly matches the described three-zone ingestion, transformation, and aggregation pipeline in Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates confuse the medallion architecture's sequential data lake zones with Lambda architecture's parallel batch/stream processing layers, or incorrectly associate star/snowflake schemas with data lake layering instead of data warehouse modeling.

How to eliminate wrong answers

Option A is wrong because a star schema is a dimensional modeling technique for data warehouses (fact and dimension tables), not a data lake layering pattern. Option B is wrong because a snowflake schema is a normalized variant of star schema, also specific to data warehouse design, not a data lake architecture. Option D is wrong because Lambda architecture separates batch and streaming processing paths (speed layer, batch layer, serving layer), not a sequential three-zone data lake ingestion and transformation pipeline.

Full explanation →

74

Multi-Selecthard

Which TWO options are valid ways to ingest data into Azure Data Lake Storage Gen2?

Select 2 answers

A.Power BI

B.Azure SQL Database

C.AzCopy

D.Azure DevOps

E.Azure Data Factory

AnswersC, E

A utility to copy data to/from ADLS Gen2.

Why this answer

AzCopy is a command-line utility designed specifically for copying data to and from Azure Blob Storage and Azure Data Lake Storage Gen2 using REST APIs. It supports high-performance, parallel transfers and can handle large datasets efficiently, making it a valid ingestion tool.

Exam trap

The trap here is that candidates may confuse tools that can read from or query Azure Data Lake Storage Gen2 (like Power BI or Azure SQL Database) with tools that can directly write or ingest data into it, overlooking the specific ingestion mechanisms required.

Full explanation →

75

MCQmedium

Your organization is migrating on-premises SQL Server databases to Azure. The databases include a mission-critical OLTP system that requires high availability with automatic failover and a reporting database that is used for read-only queries. You need to choose the appropriate Azure SQL deployment options for each workload. The OLTP system must have a recovery point objective (RPO) of less than 5 seconds and a recovery time objective (RTO) of less than 30 seconds. The reporting database should be cost-effective and can tolerate up to 5 minutes of data loss. What should you recommend?

A.Use SQL Server on Azure Virtual Machines with Always On Availability Groups for both workloads.

B.Use Azure SQL Database Hyperscale for OLTP and Azure SQL Database serverless for reporting.

C.Use Azure SQL Database Managed Instance with a failover group for OLTP, and use a read-only replica of the Managed Instance for reporting.

D.Use Azure SQL Database single database with active geo-replication for both workloads.

AnswerC

Failover group provides low RPO/RTO; read-only replica serves reporting.

Why this answer

Option C is correct because Azure SQL Database Managed Instance supports failover groups that provide automatic failover across regions with an RPO of less than 5 seconds and an RTO of less than 30 seconds, meeting the OLTP requirements. The read-only replica of the Managed Instance can be used for reporting queries without impacting the primary OLTP workload, and it is cost-effective as it does not require a separate database instance.

Exam trap

The trap here is that candidates often confuse the high availability features of Azure SQL Database single database (active geo-replication) with the stricter RPO/RTO guarantees of Managed Instance failover groups, or they assume that SQL Server on Azure VMs with Always On Availability Groups is the only option for such requirements, overlooking the managed service benefits.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines with Always On Availability Groups requires manual configuration and management of the VMs and availability groups, and it does not provide the automatic failover with the specified RPO/RTO as a managed service; it also incurs higher operational overhead and cost for both workloads. Option B is wrong because Azure SQL Database Hyperscale is designed for large databases with high scalability and fast backup/restore, but it does not guarantee an RPO of less than 5 seconds and an RTO of less than 30 seconds for automatic failover; the serverless tier for reporting is cost-effective but does not provide a read-only replica for reporting without additional cost. Option D is wrong because Azure SQL Database single database with active geo-replication can provide failover but typically has an RPO of up to 5 seconds and an RTO of up to 1 hour, which does not meet the strict RTO of less than 30 seconds for the OLTP system; using it for both workloads would also be less cost-effective for the reporting database.

Full explanation →

Page 1 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice DP-900 by domain

Target a specific domain to shore up weak areas.

Describe core data concepts Describe an analytics workload on Azure Identify considerations for relational data on Azure Describe considerations for working with non-relational data on Azure

See all domains with question counts →