Knowledge + Practice

CCNA Analytics Workload Azure Questions

75 of 262 questions · Page 1/4 · Analytics Workload Azure topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQmedium

A company uses Azure Stream Analytics to process IoT data from thousands of devices. They need to store the results in a way that supports fast querying for historical analysis. Which output sink should they use?

A.Azure Table Storage

B.Azure Blob Storage

C.Azure Data Lake Storage Gen2

D.Azure Event Hubs

AnswerC

Designed for big data analytics and fast querying.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct output sink because it combines a hierarchical namespace with Azure Blob Storage's scalable object storage, enabling fast querying for historical analysis via tools like Azure Synapse Analytics, PolyBase, or Apache Spark. ADLS Gen2 supports high-throughput writes from Stream Analytics and allows efficient directory-level operations and fine-grained access control, which are critical for large-scale IoT data analytics.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage with ADLS Gen2, assuming both are equivalent for analytics, but the key differentiator is the hierarchical namespace and native integration with big data analytics engines that ADLS Gen2 provides.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store optimized for fast point lookups and small data volumes, not for complex historical queries or large-scale analytical workloads. Option B is wrong because Azure Blob Storage lacks a hierarchical namespace, making directory-level operations and fast querying for historical analysis less efficient compared to ADLS Gen2, and it does not natively support the same level of integration with analytics engines. Option D is wrong because Azure Event Hubs is a real-time data ingestion service, not a storage sink for historical analysis; it is designed for streaming data capture and event processing, not for long-term storage and querying.

Practice this question →

2

Multi-Selectmedium

Which TWO Azure services can be used to build a real-time analytics solution that ingests streaming data and provides dashboards with low latency? (Choose two.)

Select 2 answers

A.Power BI

B.Azure Data Factory

C.Azure Data Lake Storage Gen2

D.Azure HDInsight with Spark

E.Azure Stream Analytics

AnswersA, E

Provides real-time dashboards via DirectQuery or streaming datasets.

Why this answer

Power BI is correct because it can connect to real-time data sources like Azure Stream Analytics or Event Hubs to create streaming datasets and dashboards that update automatically with low latency. It supports push datasets and real-time tiles, enabling live visualization of streaming data without batch processing delays.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Data Factory or storage-only services like Data Lake Storage Gen2 with real-time analytics capabilities, or they assume HDInsight with Spark alone provides built-in dashboards, when in fact it requires a separate visualization layer.

Practice this question →

3

MCQhard

A company uses Azure Data Lake Storage Gen2 as a data lake. They need to enforce row-level security for sensitive data so that sales representatives can only see rows for their assigned region. Which approach should they use?

A.Apply sensitivity labels in Microsoft Purview

B.Load data into Azure Synapse Analytics dedicated SQL pool and implement row-level security (RLS)

C.Use Azure RBAC roles on the storage account

D.Use Azure Data Lake Storage Gen2 access control lists (ACLs) on folders per region

AnswerB

Synapse SQL pool supports RLS to restrict row-level access based on user attributes.

Why this answer

Row-level security (RLS) in Azure Synapse Analytics dedicated SQL pool allows you to restrict data access at the row level based on a user's identity or group membership. By loading the data into a dedicated SQL pool and defining a security policy with a predicate function that filters rows by region, you can ensure sales representatives only see rows for their assigned region. This is the correct approach because RLS is designed specifically for this purpose and integrates with Azure Active Directory for user authentication.

Exam trap

The trap here is that candidates confuse storage-level access controls (ACLs, RBAC) with data-level security (RLS), assuming that folder-per-region ACLs can achieve row-level filtering, but ACLs cannot filter rows within a file.

How to eliminate wrong answers

Option A is wrong because sensitivity labels in Microsoft Purview classify and protect data at the file or column level, but they do not enforce row-level filtering based on user identity. Option C is wrong because Azure RBAC roles control access to the storage account itself (e.g., read/write permissions), not row-level visibility within a dataset. Option D is wrong because Azure Data Lake Storage Gen2 ACLs provide file- or folder-level permissions, not row-level filtering; they cannot restrict which rows a user sees within a file.

Practice this question →

4

MCQmedium

A company runs a SQL Server database on an Azure virtual machine. They need to offload reporting queries to a read-only copy without modifying the application. Which Azure service should they use?

A.Azure Analysis Services

B.Azure SQL Managed Instance

C.Azure SQL Database with read scale-out

D.Azure Synapse Analytics dedicated SQL pool

AnswerC

Azure SQL Database's read scale-out feature allows directing reporting queries to a read-only replica automatically.

Why this answer

Azure SQL Database with read scale-out (C) is correct because it creates a read-only replica of the database that can handle reporting queries without modifying the application. The application simply adds `ApplicationIntent=ReadOnly` to the connection string, and the Azure gateway automatically routes read-only queries to the secondary replica, offloading the primary from reporting workloads.

Exam trap

The trap here is that candidates confuse read scale-out with Azure SQL Managed Instance or Azure Analysis Services, assuming any read-only copy or analytics service can serve as a transparent offload, but only Azure SQL Database with read scale-out provides automatic, connection-string-based routing without application changes.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a semantic modeling and analytics engine that requires data to be imported or queried via DAX/MDX, not a read-only copy of a SQL Server database, and it cannot be used as a transparent read-only replica for existing SQL queries. Option B is wrong because Azure SQL Managed Instance is a fully managed SQL Server instance that does not support read scale-out; it offers read-only replicas only via failover groups, which require manual redirection and are not transparent to the application. Option D is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse designed for large-scale analytics, not a read-only copy of a SQL Server database, and it cannot be used to offload reporting queries without modifying the application or data pipeline.

Practice this question →

5

Drag & Dropmedium

A data engineering team wants to build a batch analytics pipeline. The raw data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The final output will be a set of tables in Azure Synapse Analytics (dedicated SQL pool) that will be used to create reports in Power BI. Arrange the following steps in the correct order for a typical ETL process.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

ETL follows the order: extract raw data to ADLS, transform it, load into Synapse staging, then transform within Synapse to final tables, and finally use Power BI for reporting.

Practice this question →

6

MCQmedium

A data analyst needs to create a Power BI report that combines sales data from Azure SQL Database and marketing data from a CSV file stored in Azure Blob Storage. The report should refresh automatically. What is the recommended approach?

A.Use Power Query in Power BI Desktop to combine the data and publish with scheduled refresh

B.Export the SQL data to Excel and combine with the CSV in Power BI

C.Use Azure Data Factory to merge the data into a single SQL table

D.Use DirectQuery mode from Power BI for both sources

AnswerA

Power Query can combine both sources and scheduled refresh updates the report.

Why this answer

Using Power Query in Power BI Desktop to combine both sources and then publishing to the Power BI service with scheduled refresh is the standard approach. Option A (Azure Data Factory) is overkill for this simple combination. Option B (DirectQuery) cannot combine a CSV file easily.

Option D (Excel) is not recommended for automation.

Practice this question →

7

MCQhard

An organization has a large dataset stored in Azure Blob Storage. They need to run complex analytics using SQL queries and also want to use the same data for machine learning models. Which Azure service provides both SQL-based analytics and native integration with ML frameworks?

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Analysis Services

D.Azure SQL Database

AnswerB

Unified analytics platform with SQL and ML.

Why this answer

Azure Synapse Analytics is correct because it provides a unified analytics platform that combines enterprise data warehousing with big data analytics. It offers built-in SQL-based querying via dedicated SQL pools or serverless SQL pools, and it natively integrates with machine learning frameworks like Apache Spark and Azure Machine Learning for building and training models on the same data stored in Azure Blob Storage.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics with Azure SQL Database or Azure Data Factory, mistakenly thinking a traditional database or an ETL tool can handle both complex SQL analytics and native ML integration on large-scale Blob Storage data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a data integration and orchestration service, not a SQL analytics engine; it cannot run SQL queries directly on data. Option C is wrong because Azure Analysis Services is an OLAP engine for semantic models and business intelligence, not designed for complex SQL analytics or native ML framework integration. Option D is wrong because Azure SQL Database is a relational database for transactional workloads, not optimized for large-scale analytics on Blob Storage data and lacks native integration with ML frameworks like Spark.

Practice this question →

8

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. They frequently run queries that aggregate sales by product and region over the past month. The queries are slow because they scan the entire table. Which index type should they implement on the fact table to improve query performance for these aggregations?

A.Clustered columnstore index

B.Clustered index on the primary key

C.Hash-distributed table on SalesID

D.Non-clustered index on (ProductID, Region)

AnswerA

Columnstore indexes are optimized for large scans and aggregations, typical in data warehouse workloads.

Why this answer

A clustered columnstore index is ideal for large fact tables in Azure Synapse Analytics dedicated SQL pool because it stores data column-wise, enabling high compression and eliminating the need to scan irrelevant columns. For aggregation queries that sum sales by product and region over the past month, the columnstore index significantly reduces I/O by reading only the necessary columns and applying batch-mode processing, which accelerates scan and aggregation operations.

Exam trap

The trap here is that candidates confuse indexing strategies for transactional OLTP workloads (where rowstore indexes like clustered or non-clustered are optimal) with analytical OLAP workloads, failing to recognize that columnstore indexes are specifically designed for large-scale aggregations and scans in dedicated SQL pools.

How to eliminate wrong answers

Option B is wrong because a clustered index on the primary key organizes data row-wise, which forces full table scans for aggregation queries and does not benefit from columnar compression or batch-mode processing. Option C is wrong because hash-distributing the table on SalesID improves data distribution and parallel processing but does not change the storage format; without a columnstore index, the table still scans all rows and columns for aggregations. Option D is wrong because a non-clustered index on (ProductID, Region) would require key lookups for additional columns and does not provide the columnar storage and compression benefits needed for efficient scan-heavy aggregation workloads.

Practice this question →

9

Matchingmedium

Match each Azure data tool to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Data integration and orchestration

Apache Spark-based analytics platform

Real-time stream processing

Distributed analytics (legacy)

Managed open-source analytics service

Why these pairings

These are key Azure data processing and analytics tools.

Practice this question →

10

MCQeasy

Your company uses Azure Data Lake Storage Gen2 and wants to grant a data scientist read-only access to a specific container. Which built-in RBAC role should you assign?

A.Storage Account Contributor

B.Reader

C.Storage Blob Data Contributor

D.Storage Blob Data Reader

AnswerD

This role grants read-only access to blob containers and data.

Why this answer

Storage Blob Data Reader (D) is the correct built-in RBAC role because it grants read-only access to Azure Storage blob containers and data, including Data Lake Storage Gen2. This role provides the necessary permissions for a data scientist to read data from a specific container without allowing write or delete operations.

Exam trap

The trap here is that candidates often confuse the ARM-level Reader role (which only allows viewing storage account metadata) with data plane roles like Storage Blob Data Reader, mistakenly thinking Reader grants data access.

How to eliminate wrong answers

Option A is wrong because Storage Account Contributor grants full management access to the storage account, including the ability to change account configuration and delete the account, which far exceeds read-only container access. Option B is wrong because Reader provides read-only access to Azure resource management (ARM) plane operations, such as viewing storage account properties, but does not grant any permissions to read data within containers or blobs. Option C is wrong because Storage Blob Data Contributor allows read, write, and delete operations on blob containers and data, which is not read-only and would grant the data scientist excessive permissions.

Practice this question →

11

MCQmedium

A company uses Azure Synapse Analytics for its data warehouse. They notice that queries against a large fact table are slow. The table is partitioned by month and uses clustered columnstore index. Which action would most likely improve query performance?

A.Rebuild the clustered columnstore index.

B.Change partitioning to daily partitions.

C.Convert the table to a clustered rowstore index.

D.Increase the data warehouse performance level (DWU).

AnswerA

Rebuilding reorganizes data and improves compression and query performance.

Why this answer

Option C is correct: Rebuilding the columnstore index reorganizes data and improves compression and query performance, especially if there have been many updates/deletes. Option A is wrong because increasing DWU may help but is not a targeted fix. Option B is wrong because rowstore is slower for analytical queries.

Option D is wrong because partitioning by day may increase partition count and overhead.

Practice this question →

12

MCQmedium

A manufacturing company installs temperature sensors in a factory. Sensor data is streamed to Azure Event Hubs. The company needs to detect when the average temperature of any sensor exceeds 100°F over a 5-minute sliding window and then send an alert. Which Azure service should be used for this real-time stream processing?

A.Azure Stream Analytics

B.Azure Functions

C.Azure SQL Database

D.Azure Logic Apps

AnswerA

Correct. Stream Analytics can query streaming data from Event Hubs using SQL-like syntax with support for time windows (e.g., Tumbling, Hopping, Sliding) to compute averages and trigger alerts.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, including the ability to define a 5-minute sliding window over sensor data from Event Hubs, compute the average temperature per sensor, and trigger an alert when the threshold of 100°F is exceeded. It natively integrates with Event Hubs as an input and supports temporal window functions like TumblingWindow, HoppingWindow, and SlidingWindow for exactly this kind of time-based aggregation.

Exam trap

The trap here is that candidates often confuse Azure Functions with Stream Analytics because both can process Event Hubs data, but Functions lacks native windowing and stateful aggregation capabilities, making it unsuitable for sliding window calculations without significant custom code.

How to eliminate wrong answers

Option B (Azure Functions) is wrong because while it can process events from Event Hubs, it lacks native support for complex windowed aggregations like a 5-minute sliding window average; you would have to manually implement state management and windowing logic, which is inefficient and error-prone for real-time streaming. Option C (Azure SQL Database) is wrong because it is a relational database for storing and querying static data, not a real-time stream processing engine; it cannot natively consume Event Hubs streams or perform sliding window computations without additional ETL and custom code. Option D (Azure Logic Apps) is wrong because it is a workflow orchestration service for integrating applications and services, not a real-time analytics engine; it lacks the ability to perform continuous, low-latency stream processing with windowed aggregations over streaming data.

Practice this question →

13

MCQmedium

Refer to the exhibit. An Azure Data Factory pipeline JSON is shown. What does this pipeline do?

A.Copies sales data from Azure SQL Database to on-premises

B.Copies all sales data from on-premises to Azure SQL Database

C.Copies filtered sales data from on-premises to Azure SQL Database

D.Copies sales data from on-premises to Azure Data Lake Storage

AnswerC

Correctly describes the source, sink, and filter.

Why this answer

The pipeline uses a Copy activity with a source dataset pointing to an on-premises SQL Server (via a self-hosted integration runtime) and a sink dataset pointing to an Azure SQL Database. The source query includes a WHERE clause filtering sales data by a date range, so only filtered data is copied. This matches option C.

Exam trap

The trap here is that candidates often overlook the WHERE clause in the source query and assume the pipeline copies all data, or they confuse the direction of data movement between on-premises and Azure.

How to eliminate wrong answers

Option A is wrong because the source is on-premises SQL Server and the sink is Azure SQL Database, not the reverse. Option B is wrong because the source query includes a WHERE clause that filters the data, so it does not copy all sales data. Option D is wrong because the sink is Azure SQL Database, not Azure Data Lake Storage.

Practice this question →

14

Multi-Selectmedium

Which THREE components are part of an end-to-end analytics solution on Azure?

Select 3 answers

A.Data ingestion (Azure Data Factory)

B.Azure Virtual Machines

C.Data visualization (Power BI)

D.Data storage (Azure Data Lake Storage)

E.Microsoft Entra ID

AnswersA, C, D

Ingests data from various sources.

Why this answer

Data ingestion using Azure Data Factory (ADF) is correct because ADF is the primary cloud-based ETL/ELT service for orchestrating and automating data movement from various sources into Azure storage or compute. It supports over 90 built-in connectors, scheduled triggers, and data flows, making it essential for the 'ingest' phase of an analytics pipeline.

Exam trap

The trap here is that candidates often confuse supporting services (like identity management or generic compute) with the core analytics pipeline components, leading them to select Azure Virtual Machines or Microsoft Entra ID as part of the end-to-end solution when they are not directly involved in the data flow.

Practice this question →

15

MCQhard

A manufacturing company ingests real-time sensor data from factory equipment via Azure Event Hubs. The data is a continuous stream of measurements (sensorId, timestamp, value). Additionally, historical maintenance records are stored as CSV files in Azure Data Lake Storage Gen2. The operations team needs to join the streaming data with the historical records in near real-time to detect anomalies. They also need to run complex T-SQL queries on the combined dataset for ad-hoc analysis. Which Azure service should they use as the primary analytics platform?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Data Factory

D.Azure HDInsight with Spark

AnswerB

Synapse Analytics unifies big data and data warehousing, enabling T-SQL queries on both streaming and batch data from Event Hubs and Data Lake Storage.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest real-time streaming data from Azure Event Hubs via its built-in pipeline or Spark Structured Streaming, and simultaneously query historical CSV files in Azure Data Lake Storage Gen2 using serverless SQL or dedicated SQL pools. This allows the operations team to join streaming and batch data in near real-time for anomaly detection, and also run complex T-SQL queries for ad-hoc analysis, all within a single service.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it handles streaming data, but they overlook the requirement for complex T-SQL ad-hoc queries, which Stream Analytics cannot support, while Azure Synapse Analytics provides both streaming ingestion and full T-SQL analytics in a single platform.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing and can join streaming data with reference data, but it cannot run complex T-SQL queries for ad-hoc analysis; it uses a SQL-like query language limited to streaming patterns and lacks full T-SQL support for interactive exploration. Option C is wrong because Azure Data Factory is an orchestration and data movement service, not an analytics platform; it cannot perform real-time stream joins or execute T-SQL queries directly. Option D is wrong because Azure HDInsight with Spark is a big data processing service that can handle streaming and batch workloads, but it does not natively support complex T-SQL queries; it requires additional tools like Spark SQL or Hive, and lacks the integrated SQL analytics capabilities of Synapse.

Practice this question →

16

MCQhard

An organization uses Azure Stream Analytics to process real-time IoT data from millions of devices. They need to ensure that the output is exactly once delivery semantics to a Power BI dataset. Which output configuration should they use?

A.Output to Azure Data Explorer and then use a Power BI dashboard.

B.Set the output to Power BI with 'At least once' delivery mode.

C.Output to Azure Event Hubs and then use Azure Functions to write to Power BI.

D.Configure the Power BI output with 'Exactly once' delivery mode.

AnswerD

Stream Analytics supports exactly once delivery to Power BI when configured.

Why this answer

Azure Stream Analytics supports exactly-once delivery semantics for Power BI output when the 'Exactly once' delivery mode is configured. This ensures that each event is processed and delivered precisely once, preventing duplicates in the Power BI dataset. The 'Exactly once' mode uses checkpointing and idempotent writes to guarantee consistency, which is critical for accurate real-time analytics from millions of IoT devices.

Exam trap

The trap here is that candidates may assume 'At least once' is the only option for Power BI output, or that an intermediate service like Event Hubs or Data Explorer inherently provides exactly-once guarantees, when in fact Stream Analytics directly supports 'Exactly once' delivery mode for Power BI.

How to eliminate wrong answers

Option A is wrong because routing through Azure Data Explorer introduces an intermediary that does not inherently provide exactly-once semantics to Power BI; it adds latency and complexity without solving the delivery guarantee. Option B is wrong because 'At least once' delivery mode allows duplicate events, which violates the requirement for exactly-once delivery. Option C is wrong because outputting to Event Hubs and then using Azure Functions does not guarantee exactly-once delivery to Power BI; Azure Functions may process events multiple times due to retries or lack of idempotency, and the architecture adds unnecessary overhead.

Practice this question →

17

Drag & Drophard

A retail company needs to build an analytics pipeline on Azure. They ingest sales data from multiple store systems and an online e-commerce platform. The data must be cleaned, transformed, and loaded into a data warehouse for reporting. The company wants to use a modern ELT (Extract, Load, Transform) approach where raw data is stored first and then transformed. Order the following steps in the correct sequence for this pipeline. (Drag the steps into the correct order.)

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Why this order

ELT approach first ingests raw data to storage, loads it into the warehouse, then transforms it before creating reporting structures.

Practice this question →

18

MCQhard

You are a data architect for a logistics company. The company uses Azure Data Lake Storage Gen2 to store shipment tracking data. The data is ingested from IoT devices on trucks. Each record contains truck ID, timestamp, GPS coordinates, speed, and fuel level. The volume is 5 TB per day. The company wants to build a near-real-time dashboard to monitor truck locations and speeds. They also need to run daily batch analytics to compute fuel efficiency trends. You need to design a solution that minimizes latency for the dashboard and maximizes cost efficiency for batch processing. You plan to use Azure Event Hubs for ingestion. Which approach should you take?

A.Use Azure Stream Analytics to process the stream and output directly to Azure SQL Database. Use Power BI to query SQL Database for both real-time dashboard and historical analytics.

B.Use Azure Event Hubs Capture to store data in Azure Blob Storage, then use Azure Data Factory to transform and load into Azure Synapse Analytics for both dashboard and batch.

C.Use Azure Databricks with Structured Streaming to process the stream, write to Delta Lake, and use Delta Lake to serve both real-time and batch queries.

D.Use Azure Stream Analytics to process the stream, output to Power BI for real-time dashboard, and simultaneously output raw data to Azure Data Lake Storage. Use Azure Databricks to process the data lake for batch analytics.

AnswerD

Stream Analytics provides low latency for dashboard; Data Lake Storage is cost-effective for large volumes; Databricks handles batch efficiently.

Why this answer

Option D is correct because it separates the real-time and batch processing paths to minimize latency and maximize cost efficiency. Azure Stream Analytics outputs directly to Power BI for near-real-time dashboard updates, while simultaneously writing raw data to Azure Data Lake Storage for cost-effective storage. Azure Databricks then processes the data lake for daily batch analytics, avoiding expensive real-time compute for historical analysis.

Exam trap

The trap here is that candidates often assume a single technology (like Databricks or Synapse) can handle both real-time and batch workloads equally well, but the DP-900 exam tests the understanding that separating the streaming path (Stream Analytics to Power BI) from the batch path (Data Lake to Databricks) optimizes for both latency and cost.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is not optimized for high-velocity streaming ingestion and would introduce latency for the dashboard, plus it is costly for storing 5 TB/day of raw data. Option B is wrong because Event Hubs Capture to Blob Storage and then Azure Data Factory to Synapse Analytics introduces batch processing latency that cannot meet near-real-time dashboard requirements. Option C is wrong because while Delta Lake can serve both real-time and batch queries, using Databricks Structured Streaming for the dashboard adds unnecessary complexity and cost compared to a dedicated stream processing service like Stream Analytics.

Practice this question →

19

MCQeasy

A transportation company collects real-time GPS data from thousands of delivery vehicles. They need to process this streaming data to detect delays and generate alerts when a vehicle is behind schedule. Which Azure service should they use for the stream processing?

A.A. Azure Data Factory

B.B. Azure Stream Analytics

C.C. Azure Data Lake Analytics

D.D. Azure Analysis Services

AnswerB

Azure Stream Analytics is a fully managed service for real-time stream processing that can process millions of events per second and trigger alerts based on conditions.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed to handle high-velocity data from sources like IoT devices and GPS sensors. It can ingest streaming data from Azure Event Hubs or IoT Hub, apply SQL-based queries to detect patterns such as delays, and output alerts to sinks like Azure Functions or Power BI in near real-time.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's batch orchestration capabilities with real-time processing, or mistakenly think Azure Data Lake Analytics can handle streaming data because of its 'analytics' name, but neither supports continuous, low-latency stream processing.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data integration service for orchestrating batch data movement and transformation, not designed for real-time stream processing. Option C is wrong because Azure Data Lake Analytics is a batch analytics service that processes large volumes of data stored in Data Lake Storage using U-SQL, not suited for low-latency streaming scenarios. Option D is wrong because Azure Analysis Services is an analytical engine for semantic modeling and business intelligence on pre-processed data, not capable of ingesting or processing real-time streaming data.

Practice this question →

20

MCQeasy

A business analyst needs to create interactive visualizations and share dashboards with colleagues using data stored in an Azure Synapse Analytics dedicated SQL pool. Which tool should the analyst use?

A.Azure Data Studio

B.Power BI Desktop

C.SQL Server Management Studio (SSMS)

D.Azure Machine Learning Studio

AnswerB

Power BI Desktop enables connecting to Synapse, building interactive dashboards, and publishing to the Power BI service for sharing.

Why this answer

Power BI Desktop is the correct tool because it is designed for creating interactive visualizations and dashboards, and it can connect directly to Azure Synapse Analytics dedicated SQL pools via the built-in Azure Synapse Analytics connector. This allows the business analyst to build reports and share them with colleagues through the Power BI service, meeting the requirement for interactive dashboards and collaboration.

Exam trap

The trap here is that candidates often confuse Azure Data Studio or SSMS as tools for visualization because they can run queries and view results, but they lack the interactive dashboard and sharing features that Power BI provides.

How to eliminate wrong answers

Option A is wrong because Azure Data Studio is a database management and query tool focused on SQL development and administration, not on creating interactive visualizations or sharing dashboards. Option C is wrong because SQL Server Management Studio (SSMS) is primarily for managing SQL Server and Azure SQL databases, including querying and administration, but it lacks the visualization and dashboard-sharing capabilities needed for business analytics. Option D is wrong because Azure Machine Learning Studio is a platform for building, training, and deploying machine learning models, not for creating interactive visualizations or dashboards from data in a dedicated SQL pool.

Practice this question →

21

MCQmedium

A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?

A.External tables

B.Materialized views

C.Stored procedures

D.Indexed views

AnswerA

Correct. External tables use PolyBase to query external data sources like Azure Data Lake Storage using T-SQL without loading data into Synapse.

Why this answer

External tables in Azure Synapse Analytics allow you to query data stored in files (such as CSV in Azure Data Lake Storage Gen2) using T-SQL without loading the data into the database. They use the PolyBase engine to read the files directly, enabling on-the-fly querying of external data sources.

Exam trap

The trap here is that candidates may confuse external tables with materialized views or indexed views, thinking any database object can query external files, but only external tables provide the PolyBase-based bridge to read data directly from storage without loading.

How to eliminate wrong answers

Option B is wrong because materialized views store pre-computed data physically in the database, requiring data to be loaded first, which contradicts the requirement to query without loading. Option C is wrong because stored procedures are a set of T-SQL statements executed on the database, but they do not provide a mechanism to directly query external files without loading data into tables. Option D is wrong because indexed views require data to be stored in the database and indexed, which again necessitates loading the data first.

Practice this question →

22

MCQhard

A marketing company stores years of historical campaign data in Azure Data Lake Storage Gen2 as Parquet files. Data analysts need to run complex SQL queries over this data to identify trends, and they want to visualize results in Power BI dashboards. The company wants to avoid moving data into a separate database to minimize duplication and latency. Which Azure service should they use to query the data directly in the data lake?

A.Azure Data Factory

B.Azure Synapse Serverless SQL pool

C.Azure Databricks

D.Azure HDInsight

AnswerB

Serverless SQL pool (part of Azure Synapse Analytics) enables querying data directly from Azure Data Lake Storage using standard T-SQL. It is serverless (no infrastructure to manage) and perfect for ad-hoc analytics integration with Power BI.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows you to run T-SQL queries directly over Parquet files in Azure Data Lake Storage Gen2 without moving or copying the data. It uses a pay-per-query model and supports standard SQL syntax, making it ideal for analysts who need to query historical campaign data and visualize results in Power BI with minimal latency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (an ETL tool) with a query service, or they assume Azure Databricks is the only option for big data SQL queries, overlooking the serverless SQL pool's ability to query data in place without cluster management.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data orchestration service, not a query engine; it cannot run complex SQL queries directly against data in the lake. Option C is wrong because Azure Databricks is a Spark-based analytics platform that requires cluster management and coding (Python, Scala, SQL) rather than providing a serverless SQL endpoint for ad-hoc queries. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service that requires provisioning and ongoing management, and it does not offer a serverless SQL interface for direct querying of Parquet files in the data lake.

Practice this question →

23

MCQhard

A large e-commerce company needs to build an analytics solution. They have streaming clickstream data from their website (JSON) and daily sales data from their transactional database (CSV). They need to perform real-time dashboards on clickstream for the current hour, and also run complex historical queries that join sales data with aggregated clickstream data over the past year. They want a single Azure service that can handle both stream processing and batch processing using a unified experience, without moving data between separate systems. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Analytics

D.Azure HDInsight

AnswerC

Synapse Analytics provides a unified analytics experience with support for both real-time stream processing (via Synapse Pipelines and Spark structured streaming) and large-scale batch analytics using dedicated SQL pools or serverless SQL. It meets all requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified experience for both stream processing (via Synapse Pipelines or Spark Structured Streaming) and batch processing (via dedicated SQL pools or serverless SQL), enabling real-time dashboards on clickstream data and complex historical queries joining sales data with aggregated clickstream data without moving data between separate systems.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a pure stream processor) with a unified analytics service, overlooking that Synapse Analytics can handle both real-time and batch workloads in a single platform without requiring separate data movement or additional services.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service that cannot natively handle complex batch queries over historical data or join with CSV-based sales data without additional storage and orchestration. Option B is wrong because Azure Data Factory is an ETL and orchestration service that does not provide real-time stream processing or native query capabilities for dashboards. Option D is wrong because Azure HDInsight requires managing separate clusters for stream (e.g., Kafka/Spark Streaming) and batch (e.g., Hive/Spark) processing, and does not offer a unified single-service experience without moving data between systems.

Practice this question →

24

MCQmedium

A retail company collects streaming clickstream data from its website into Azure Event Hubs. They need to aggregate the data in real-time to count page views per product every minute and store the results in Azure SQL Database for a live dashboard. Which Azure service should they use to perform this real-time aggregation?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Pipelines

D.Azure Databricks

AnswerA

Correct. Stream Analytics is the simplest and most cost-effective service for real-time stream processing with built-in windowing and output to SQL Database.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, allowing you to define a query that aggregates clickstream data from Event Hubs using a tumbling window of one minute to count page views per product. The result can be written directly to Azure SQL Database, enabling a live dashboard without additional orchestration.

Exam trap

The trap here is that candidates may confuse Azure Stream Analytics with Azure Data Factory or Synapse Pipelines because both can move data, but only Stream Analytics provides native, low-latency stream processing with temporal windowing for real-time aggregation.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a data integration and orchestration service for batch and scheduled data movement, not a real-time stream processing engine. Option C (Azure Synapse Pipelines) is wrong because it is essentially the same as Data Factory within Synapse, designed for batch ETL and orchestration, not for continuous, low-latency aggregation of streaming data. Option D (Azure Databricks) is wrong because while it can process streaming data via Structured Streaming, it requires a cluster to be running and is overkill for a simple per-minute aggregation; it is not the simplest or most cost-effective service for this specific real-time aggregation task.

Practice this question →

25

MCQmedium

A smart building monitoring company ingests real-time sensor data (temperature, humidity, occupancy) from thousands of IoT devices into Azure Event Hubs. The company also stores historical building blueprints and maintenance records as CSV files in Azure Data Lake Storage Gen2. The engineering team needs to build a dashboard that displays live sensor readings overlaid on building floor plans, and also allows facility managers to run ad-hoc T-SQL queries that combine live sensor data with historical maintenance records. Which Azure service should they use as the primary analytics platform to meet both requirements?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Databricks

D.Azure Analysis Services

AnswerB

Azure Synapse Analytics unifies big data and data warehousing. It can ingest streaming data via Event Hubs, query both streaming and batch data using T-SQL across the data lake, and support dashboarding, making it the best fit for both real-time and ad-hoc requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that combines big data and data warehousing. It can ingest real-time streaming data from Azure Event Hubs via its built-in Spark pools or pipelines, and also run ad-hoc T-SQL queries against both the live sensor data (staged in tables) and historical CSV files stored in Azure Data Lake Storage Gen2 using serverless SQL pools. This meets both the real-time dashboard and ad-hoc T-SQL query requirements.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because they focus only on the real-time dashboard requirement, overlooking the need for ad-hoc T-SQL queries against historical data, which Stream Analytics cannot fulfill.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks like Power BI or storage, but it does not support ad-hoc T-SQL queries against historical data or provide a unified analytics platform for combining live and historical data. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that excels at batch and stream processing with Python, Scala, or SQL, but it does not natively support ad-hoc T-SQL queries (it uses Spark SQL, not T-SQL) and requires more manual setup for a traditional data warehouse experience. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and fast interactive queries on pre-aggregated data, but it cannot directly ingest real-time streaming data from Event Hubs or run ad-hoc T-SQL queries against raw CSV files in Data Lake Storage.

Practice this question →

26

MCQeasy

A company needs to analyze streaming data from IoT devices in real time. They want to identify anomalies and trigger alerts. Which Azure service should they use as the core processing engine?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure Databricks

D.Azure Data Lake Storage

AnswerA

Stream Analytics is built for real-time data processing and alerting.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, allowing you to define SQL-like queries that run continuously against streaming data from sources like IoT Hub. It can detect anomalies and trigger alerts on the fly, making it the correct core processing engine for this IoT scenario.

Exam trap

The trap here is that candidates often confuse batch processing tools like Synapse Analytics or Databricks with real-time stream processing, overlooking that only Azure Stream Analytics is designed as a dedicated, low-latency stream processing engine for this exact pattern.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is a unified analytics platform designed for batch and interactive analytics on large-scale data, not for real-time stream processing with sub-second latency. Option C is wrong because Azure Databricks is a big data and AI platform that can process streaming data via Structured Streaming, but it is not the simplest or most direct service for real-time anomaly detection and alerting; it requires more complex setup and is not the core processing engine for this specific use case. Option D is wrong because Azure Data Lake Storage is a scalable data lake for storing raw data, not a processing engine; it cannot analyze streaming data or trigger alerts in real time.

Practice this question →

27

Multi-Selecthard

Which THREE components are required to implement a real-time analytics solution using Azure Stream Analytics? (Choose three.)

Select 3 answers

A.Stream input (e.g., Azure Event Hubs)

B.Batch input (e.g., Azure Blob Storage)

C.Machine learning model

D.Stream Analytics query

E.Output sink (e.g., Azure Synapse Analytics)

AnswersA, D, E

A stream input is the source of real-time data for Stream Analytics.

Why this answer

A stream input like Azure Event Hubs is required because Azure Stream Analytics processes data in real time from a streaming source. Event Hubs ingests millions of events per second, providing the low-latency, ordered event stream that Stream Analytics consumes via its input binding. Without a streaming source, the service cannot perform continuous, real-time analytics.

Exam trap

The trap here is that candidates often confuse optional components (like batch input for reference data or machine learning for advanced analytics) with mandatory ones, leading them to select B or C instead of recognizing that only stream input, the query, and an output sink form the core required triad.

Practice this question →

28

MCQmedium

A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?

A.Star schema

B.Snowflake schema

C.Medallion architecture

D.Lambda architecture

AnswerC

Correct. The Medallion architecture defines bronze, silver, and gold layers to progressively improve data quality and make data ready for analytics. It is a key concept in modern lakehouse architectures.

Why this answer

The medallion architecture (bronze, silver, gold) is a layered data lake design pattern where raw data lands in the bronze zone, is cleaned and deduplicated in the silver zone, and aggregated into business-ready datasets in the gold zone. This directly matches the described three-zone ingestion, transformation, and aggregation pipeline in Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates confuse the medallion architecture's sequential data lake zones with Lambda architecture's parallel batch/stream processing layers, or incorrectly associate star/snowflake schemas with data lake layering instead of data warehouse modeling.

How to eliminate wrong answers

Option A is wrong because a star schema is a dimensional modeling technique for data warehouses (fact and dimension tables), not a data lake layering pattern. Option B is wrong because a snowflake schema is a normalized variant of star schema, also specific to data warehouse design, not a data lake architecture. Option D is wrong because Lambda architecture separates batch and streaming processing paths (speed layer, batch layer, serving layer), not a sequential three-zone data lake ingestion and transformation pipeline.

Practice this question →

29

Multi-Selecthard

Which TWO options are valid ways to ingest data into Azure Data Lake Storage Gen2?

Select 2 answers

A.Power BI

B.Azure SQL Database

C.AzCopy

D.Azure DevOps

E.Azure Data Factory

AnswersC, E

A utility to copy data to/from ADLS Gen2.

Why this answer

AzCopy is a command-line utility designed specifically for copying data to and from Azure Blob Storage and Azure Data Lake Storage Gen2 using REST APIs. It supports high-performance, parallel transfers and can handle large datasets efficiently, making it a valid ingestion tool.

Exam trap

The trap here is that candidates may confuse tools that can read from or query Azure Data Lake Storage Gen2 (like Power BI or Azure SQL Database) with tools that can directly write or ingest data into it, overlooking the specific ingestion mechanisms required.

Practice this question →

30

MCQeasy

A company wants to build a real-time dashboard that visualizes sales data as transactions occur. Which combination of Azure services should they use?

A.Azure Synapse Analytics and PolyBase

B.Azure Data Explorer and Azure Data Lake Storage

C.Azure Stream Analytics and Power BI

D.Azure Analysis Services and Excel

AnswerC

Stream Analytics processes streams; Power BI visualizes.

Why this answer

Azure Stream Analytics is a real-time event processing engine that can ingest streaming data (e.g., from Azure Event Hubs or IoT Hub) and output results directly to Power BI via the built-in Power BI output sink. This combination enables a live dashboard that updates automatically as sales transactions occur, without requiring batch processing or manual refresh.

Exam trap

The trap here is that candidates often confuse batch-oriented services (like Synapse or Analysis Services) with real-time streaming, or assume that any storage-plus-query combination (like Data Explorer + Data Lake) can achieve live dashboards, ignoring the need for a dedicated stream processing engine with a direct visualization output.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is a distributed analytics service designed for large-scale data warehousing and batch/streaming integration, but PolyBase is used for querying external data sources (e.g., Azure Storage) via T-SQL, not for real-time dashboard visualization. Option B is wrong because Azure Data Explorer is optimized for interactive analytics on large volumes of time-series and log data, and Azure Data Lake Storage is a hierarchical file store; together they support ad-hoc queries but lack the native real-time streaming-to-visualization pipeline that Stream Analytics provides. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and tabular data, and Excel is a client tool; this combination requires manual data refresh and cannot deliver real-time streaming updates to a live dashboard.

Practice this question →

31

MCQhard

A financial institution needs to run complex queries against petabytes of historical trading data stored in Azure Data Lake Storage. The queries must be efficient and use columnar storage format. Which technology should they use to process this data?

A.Azure SQL Database

B.Azure Synapse Analytics

C.Azure Cosmos DB

D.Azure Table Storage

AnswerB

Correct. Synapse Analytics combines SQL query capabilities with MPP architecture and supports columnar storage formats like Parquet, making it ideal for large-scale analytics.

Why this answer

Azure Synapse Analytics (formerly SQL Data Warehouse) is the correct choice because it is a cloud-based analytics service designed for petabyte-scale data warehousing. It supports PolyBase to query data directly from Azure Data Lake Storage and uses a columnar storage format (via clustered columnstore indexes) to enable efficient, high-performance analytical queries on massive historical datasets.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (a transactional system) with Azure Synapse Analytics (an analytical system), assuming both can handle petabyte-scale analytics, but only Synapse provides the columnar storage and MPP engine required for efficient historical data queries.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a transactional OLTP database optimized for row-based storage and small, frequent read/write operations, not for petabyte-scale analytical queries requiring columnar storage. Option C is wrong because Azure Cosmos DB is a NoSQL database designed for globally distributed, low-latency transactional workloads with flexible schemas, not for running complex analytical queries on petabytes of historical data in columnar format. Option D is wrong because Azure Table Storage is a key-value store for semi-structured NoSQL data, lacking columnar storage and the distributed query engine needed for efficient petabyte-scale analytics.

Practice this question →

32

MCQmedium

A data analyst needs to run interactive SQL queries on a large dataset stored as CSV files in Azure Blob Storage. The analyst wants to explore the data using T-SQL without loading the data into a database. Which Azure service should they use?

A.Azure SQL Database

B.Azure Synapse Serverless SQL pool

C.Azure Data Factory

D.Azure Stream Analytics

AnswerB

This service can query CSV files directly in Azure Blob Storage using T-SQL with serverless, pay-per-query compute.

Why this answer

Azure Synapse Serverless SQL pool is correct because it allows you to run interactive T-SQL queries directly against CSV files in Azure Blob Storage without loading the data into a database. It uses a pay-per-query model and leverages the OPENROWSET function to query external data in place, making it ideal for ad-hoc exploration of large datasets.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming both require data loading, but the serverless pool is specifically designed for external data querying without ingestion.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database requires data to be loaded into a relational database before querying, which contradicts the requirement to avoid loading data. Option C is wrong because Azure Data Factory is an ETL and orchestration service, not an interactive SQL query engine; it cannot run T-SQL queries directly on CSV files. Option D is wrong because Azure Stream Analytics is designed for real-time stream processing, not for interactive ad-hoc queries on static CSV files in Blob Storage.

Practice this question →

33

MCQhard

Your company operates a retail analytics platform. Data from point-of-sale systems is ingested in real time into Azure Event Hubs. The data is then consumed by an Azure Stream Analytics job that aggregates sales by store and product every minute, writing results to Azure SQL Database. The business now requires a historical trend analysis capability that can query the last three years of sales data with sub-second response times, but the SQL Database is already experiencing performance issues due to high write volume. You need to redesign the serving layer to support both real-time dashboards (seconds latency) and historical analytics (sub-second queries on years of data) without impacting write performance. What should you do?

A.Modify the Azure Stream Analytics job to write to Azure Data Lake Storage Gen2, then use Azure Databricks to run batch transformations and load into a separate Azure Synapse Analytics dedicated SQL pool for historical queries. Keep the real-time dashboard connected to Azure SQL Database.

B.Add Azure Cosmos DB as an additional output from Azure Stream Analytics for real-time dashboards. Enable Azure Synapse Link on the Cosmos DB container to continuously replicate data to Azure Synapse Analytics for historical queries. Remove the real-time dashboard load from Azure SQL Database.

C.Add Azure Analysis Services on top of Azure SQL Database and create a cube for historical queries. Use Azure Stream Analytics to write to Analysis Services directly.

D.Partition the existing Azure SQL Database and add columnstore indexes to improve query performance. Keep the current architecture and increase the DTU/vCore limit to handle both workloads.

AnswerB

Cosmos DB provides low latency for real-time dashboards, and Synapse Link enables near real-time analytics on historical data without impacting writes.

Why this answer

Option C is correct. Azure Stream Analytics can output to multiple sinks simultaneously. By adding Azure Cosmos DB as an additional output for real-time dashboards (low latency, high throughput), and using Azure Synapse Link to continuously replicate Cosmos DB data to Azure Synapse Analytics for historical queries, you separate workloads.

Option A uses Azure Analysis Services but it is not designed for sub-second queries on raw historical data. Option B uses Azure Data Lake Storage with Databricks, but batch processing would not meet the real-time requirement and adds latency. Option D uses Azure SQL Database with indexing improvements, but the write volume already causes issues, and SQL DB is not ideal for sub-second queries on years of data without partitioning and scaling.

Practice this question →

34

Multi-Selecthard

Which THREE components are essential for building a real-time analytics solution on Azure?

Select 3 answers

A.Azure Data Lake Storage

B.Azure Analysis Services

C.Power BI

D.Azure Event Hubs

E.Azure Stream Analytics

AnswersC, D, E

Power BI visualizes real-time dashboards.

Why this answer

Power BI is correct because it serves as the visualization and reporting layer in a real-time analytics solution on Azure. It connects directly to Azure Stream Analytics or Event Hubs to render live dashboards and alerts, enabling users to monitor streaming data with sub-second latency.

Exam trap

The trap here is that candidates often confuse batch storage (Data Lake Storage) or offline analytical tools (Analysis Services) with real-time streaming components, overlooking that real-time analytics requires ingestion (Event Hubs), processing (Stream Analytics), and visualization (Power BI) working in concert.

Practice this question →

35

Multi-Selectmedium

Which TWO Azure services can be used to build a data pipeline that moves data from on-premises SQL Server to Azure Synapse Analytics?

Select 2 answers

A.Azure Data Factory

B.Azure Databricks

C.Azure Machine Learning

D.Azure Stream Analytics

E.Azure Analysis Services

AnswersA, B

Data Factory can copy data from on-premises SQL Server to Synapse via self-hosted integration runtime.

Why this answer

Azure Data Factory (A) is correct because it is a cloud-based ETL and data integration service that provides built-in connectors for both on-premises SQL Server (via self-hosted integration runtime) and Azure Synapse Analytics, enabling you to create, schedule, and orchestrate data pipelines that move and transform data between these sources.

Exam trap

The trap here is that candidates often confuse Azure Databricks (a data engineering and analytics platform) with a pure pipeline orchestration service, but it is correct in this context because it can read from on-prem SQL Server via JDBC and write to Synapse using the Spark Synapse connector, making it a valid alternative for building the data pipeline.

Practice this question →

36

MCQhard

Refer to the exhibit. A data engineer wants to ensure that all Azure Storage accounts used for analytics use customer-managed keys. They apply this Azure Policy. What is the outcome?

A.The policy audits existing storage accounts for compliance

B.The policy allows all storage accounts whether they use customer-managed keys or not

C.The policy only applies to storage accounts with hierarchical namespace enabled

D.The policy denies creation of StorageV2 accounts that use Microsoft-managed keys

AnswerD

It denies when keySource is not Microsoft.Keyvault, meaning only accounts with customer-managed keys are allowed.

Why this answer

The Azure Policy in the exhibit uses the 'Deny' effect on storage accounts that do not use customer-managed keys (i.e., that use Microsoft-managed keys). This means any attempt to create or update a storage account without customer-managed encryption will be blocked. Option D correctly identifies that the policy denies creation of StorageV2 accounts (or any storage account type covered by the policy scope) that use Microsoft-managed keys.

Exam trap

The trap here is that candidates confuse the 'Deny' effect with 'Audit' or 'Append', and mistakenly think the policy only monitors or tags accounts instead of actively blocking non-compliant ones.

How to eliminate wrong answers

Option A is wrong because the policy uses the 'Deny' effect, not 'Audit'; an audit effect would only log compliance without blocking creation. Option B is wrong because the policy explicitly denies storage accounts that do not use customer-managed keys, so it does not allow all accounts regardless of key type. Option C is wrong because the policy does not filter by hierarchical namespace; it applies to all storage accounts in the scope, regardless of whether hierarchical namespace is enabled.

Practice this question →

37

MCQmedium

A retail chain collects daily sales data from hundreds of stores. The data is stored as CSV files in Azure Data Lake Storage Gen2. The analytics team needs to run complex SQL queries that join sales data with product dimensions and aggregate results across petabytes of data. Queries must return results within seconds. Which Azure service is best suited for this analytical workload?

A.Azure Synapse Analytics

B.Azure SQL Database

C.Azure Analysis Services

D.Azure HDInsight

AnswerA

Correct. Synapse Analytics provides a SQL-based engine optimized for large-scale analytical queries and can directly query data in Data Lake Storage with PolyBase or CETAS.

Why this answer

Azure Synapse Analytics (formerly SQL Data Warehouse) is the correct choice because it is a distributed query engine designed for petabyte-scale data warehousing. It uses Massively Parallel Processing (MPP) to distribute data across compute nodes, enabling complex SQL joins and aggregations on data stored in Azure Data Lake Storage Gen2 to return results in seconds via its SQL pool or serverless SQL endpoint.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics with Azure SQL Database, assuming both can handle large analytical queries, but Azure SQL Database lacks the MPP architecture and external table support needed for petabyte-scale data lake queries.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a relational OLTP database optimized for transactional workloads with limited scale-out capabilities; it cannot efficiently handle petabyte-scale analytical queries across CSV files in Data Lake Storage. Option C is wrong because Azure Analysis Services is a semantic modeling and in-memory OLAP engine that requires data to be pre-loaded into memory and is not designed for direct SQL queries on raw CSV files at petabyte scale. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that can process big data but requires custom coding (e.g., HiveQL, Spark SQL) and does not provide the instant, serverless SQL query experience with sub-second response times for complex joins across petabytes that Synapse offers.

Practice this question →

38

MCQeasy

A company wants to build a real-time analytics dashboard for IoT sensor data. Which combination of Azure services should they use?

A.Azure HDInsight and Azure Databricks

B.Azure Logic Apps and Azure SQL Database

C.Azure Data Factory and Azure Analysis Services

D.Azure Stream Analytics and Power BI

AnswerD

Stream Analytics provides real-time processing, Power BI displays live dashboards.

Why this answer

Azure Stream Analytics is a real-time event processing engine that can ingest IoT sensor data from sources like Azure Event Hubs, apply SQL-based queries to detect patterns or anomalies, and output results directly to Power BI for live dashboard visualization. This combination provides end-to-end streaming analytics with sub-second latency, which is essential for real-time dashboards.

Exam trap

The trap here is that candidates often confuse batch processing services (like Azure Data Factory or HDInsight) with real-time streaming services, or assume that any database (like Azure SQL) can handle high-velocity streaming data, but only Stream Analytics provides the necessary event-time processing and low-latency output for live dashboards.

How to eliminate wrong answers

Option A is wrong because Azure HDInsight and Azure Databricks are designed for batch and big data processing (e.g., Hadoop/Spark jobs), not for low-latency real-time streaming; they introduce significant overhead for continuous sensor data. Option B is wrong because Azure Logic Apps is an orchestration service for workflows and integrations, not a stream processing engine, and Azure SQL Database is optimized for transactional workloads, not real-time analytics on high-velocity streaming data. Option C is wrong because Azure Data Factory is a cloud ETL service for batch data movement and transformation, and Azure Analysis Services is an OLAP engine for historical analytics; neither supports real-time ingestion or live dashboard updates.

Practice this question →

39

MCQmedium

Refer to the exhibit. You deploy this Azure Stream Analytics job. The job runs but no data is written to the Azure SQL Database table. What is the most likely cause?

A.The query syntax is invalid

B.The TumblingWindow has not yet elapsed

C.The Event Hub name is incorrect

D.The SKU does not support SQL output

AnswerB

Aggregate results are emitted only after each window ends; if runtime < 5 minutes, no output.

Why this answer

The TumblingWindow function in Azure Stream Analytics processes data in fixed, non-overlapping time intervals. Since the job runs but no data is written, the most likely cause is that the first window has not yet elapsed, meaning no aggregation has been triggered to output results. Stream Analytics only emits results when a tumbling window completes, so data will not appear in the SQL Database until the window duration passes.

Exam trap

The trap here is that candidates assume a running job with no errors means data should flow immediately, but they overlook that tumbling windows require the full window duration to elapse before any output is produced.

How to eliminate wrong answers

Option A is wrong because if the query syntax were invalid, the job would fail to start or would generate a compilation error, not run without writing data. Option C is wrong because an incorrect Event Hub name would cause the job to fail at input connection time, not allow it to run silently with no output. Option D is wrong because all Stream Analytics SKUs (including Standard and higher) support Azure SQL Database as an output sink; there is no SKU restriction that would prevent writing to SQL.

Practice this question →

40

MCQhard

A multinational corporation uses Azure Data Factory to orchestrate data pipelines across multiple regions. The company notices that pipeline runs in the West Europe region consistently fail due to throttling errors from the source database. The source database is an Azure SQL Database in the same region. The company needs to reduce throttling while maintaining pipeline throughput. What should the company do?

A.Increase the maximum number of Data Factory activity retries.

B.Stage extracted data in Azure Blob Storage before loading into the destination.

C.Use a self-hosted integration runtime in the same region to execute pipelines.

D.Configure auto-scale on the Azure SQL Database to handle higher load.

AnswerD

Auto-scale adjusts database resources to meet demand, reducing throttling errors.

Why this answer

The throttling errors originate from the Azure SQL Database source, which has resource limits (DTU or vCore-based). Configuring auto-scale on the Azure SQL Database dynamically adjusts its performance tier to handle higher concurrency and throughput, directly addressing the root cause of throttling while maintaining pipeline throughput. This aligns with the requirement to reduce throttling without reducing workload.

Exam trap

The trap here is that candidates confuse throttling errors with network or connectivity issues, leading them to choose integration runtime or staging solutions, when the root cause is the source database's resource limits.

How to eliminate wrong answers

Option A is wrong because increasing activity retries only re-executes failed operations, which does not prevent throttling and may exacerbate load on the source database. Option B is wrong because staging data in Blob Storage addresses destination load or transformation patterns, not source-side throttling from Azure SQL Database. Option C is wrong because using a self-hosted integration runtime in the same region improves network latency and connectivity but does not reduce the source database's resource contention or throttling limits.

Practice this question →

41

MCQmedium

A data analyst uses Power BI to create a report that combines data from Azure Synapse Analytics and an on-premises SQL Server database. The on-premises data must be refreshed every hour. Which component is required to connect to the on-premises data source?

A.Azure VPN Gateway

B.On-premises data gateway

C.Azure ExpressRoute

D.Azure Data Factory

AnswerB

The on-premises data gateway acts as a bridge between Power BI and on-premises data sources, allowing scheduled refreshes.

Why this answer

The on-premises data gateway is required to securely connect Power BI to on-premises SQL Server databases for scheduled refreshes. It acts as a bridge, transmitting data from the on-premises source to the Power BI service without opening inbound firewall ports. This component is specifically designed for self-service analytics scenarios where cloud services need to access on-premises data sources.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's self-hosted integration runtime with the Power BI on-premises data gateway, but Power BI requires its own dedicated gateway component for scheduled refreshes, not Data Factory.

How to eliminate wrong answers

Option A is wrong because Azure VPN Gateway establishes site-to-site or point-to-site encrypted tunnels between Azure and on-premises networks, but it is not the component used by Power BI for scheduled data refresh from on-premises SQL Server. Option C is wrong because Azure ExpressRoute provides a dedicated private network connection to Azure, but it is not required for Power BI's on-premises data gateway functionality and is typically used for high-bandwidth, low-latency enterprise scenarios. Option D is wrong because Azure Data Factory is a cloud-based ETL and data integration service that can copy data from on-premises sources using self-hosted integration runtimes, but it is not the component that Power BI directly uses for its scheduled refresh of on-premises data.

Practice this question →

42

MCQhard

A manufacturing company ingests a continuous stream of sensor data from factory equipment into Azure Event Hubs. Additionally, historical maintenance data in CSV format is stored in Azure Data Lake Storage Gen2. The analytics team needs to join the streaming sensor data with the historical data in near real-time and enable analysts to query the combined dataset using standard T-SQL without moving the data. Which Azure service should they use as the primary analytics platform?

A.A) Azure Stream Analytics

B.B) Azure Synapse Analytics with Synapse Pipelines and serverless SQL pool

C.C) Azure SQL Database

D.D) Azure Databricks

AnswerB

Correct. Synapse Pipelines can orchestrate the ingestion of both streaming (Event Hubs) and batch (Data Lake) data. The data can be stored in the lake, and the serverless SQL pool can query it using T-SQL, providing a unified analytics surface without moving data. This meets all requirements.

Why this answer

Azure Synapse Analytics with Synapse Pipelines and serverless SQL pool is the correct choice because it can ingest streaming data from Event Hubs via pipelines, query historical CSV data in Data Lake Storage Gen2 directly using T-SQL without moving it, and join both datasets in near real-time using the serverless SQL pool's ability to reference external data sources. This meets the requirement for standard T-SQL queries on combined streaming and historical data without data movement.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it is the most obvious service for streaming data, but they overlook the requirement for standard T-SQL queries on combined datasets without data movement, which Stream Analytics cannot fulfill as it uses its own query language and cannot directly join with static data in Data Lake Storage Gen2 using T-SQL.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing but cannot directly query historical data in Data Lake Storage Gen2 using standard T-SQL without moving it; it requires defining input/output mappings and does not support ad-hoc T-SQL joins across streaming and static data. Option C is wrong because Azure SQL Database requires data to be loaded and stored within the database, which would involve moving the historical CSV data from Data Lake Storage Gen2, violating the 'without moving the data' requirement. Option D is wrong because Azure Databricks uses Spark SQL or Python, not standard T-SQL, and typically requires data to be loaded into DataFrames for processing, which involves data movement and does not natively support serverless T-SQL queries on external data.

Practice this question →

43

MCQhard

A financial analytics company has petabytes of transaction data stored as Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple tables and return results within seconds. The company wants to query the data directly without moving it to another store. Which Azure service should they use?

A.Azure SQL Database

B.Azure Synapse Serverless SQL pool

C.Azure HDInsight

D.Azure Databricks

AnswerB

Serverless SQL pool can directly query Parquet files in the data lake using standard T-SQL and scales automatically for large datasets.

Why this answer

Azure Synapse Serverless SQL pool is correct because it allows querying data directly from Azure Data Lake Storage Gen2 using T-SQL without moving or loading the data. It uses a distributed query engine that can process petabytes of Parquet files and return results in seconds by leveraging pushdown computation and columnar storage formats.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming both require data movement, or they overcomplicate the solution by choosing a cluster-based service like HDInsight or Databricks when a serverless query engine is sufficient.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database that requires data to be imported and stored in its own managed storage, not querying data in place from Data Lake Storage Gen2. Option C is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that requires provisioning and managing compute resources, and it is not optimized for instant serverless SQL queries on Parquet files. Option D is wrong because Azure Databricks is an Apache Spark-based analytics platform that requires a running cluster and is more suited for complex data engineering and machine learning workloads, not for simple serverless SQL queries on data at rest.

Practice this question →

44

MCQhard

A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?

A.Azure Data Lake Storage Gen2

B.Azure SQL Database

C.Azure Synapse Analytics dedicated SQL pool

D.Azure Cosmos DB

AnswerC

Azure Synapse Analytics dedicated SQL pool uses MPP and columnar storage to execute complex queries over huge datasets efficiently. It is purpose-built for large-scale data warehousing and analytical workloads.

Why this answer

Azure Synapse Analytics dedicated SQL pool is the correct choice because it is a massively parallel processing (MPP) engine designed for petabyte-scale data warehousing. It can run complex aggregations and time-series queries with fast results by distributing data across 60 distributions and using columnstore indexes for high compression and scan efficiency.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage Gen2 (a storage layer) with a query engine, assuming it can directly perform fast analytical queries, when in fact it requires a compute service like Synapse or Spark on top.

How to eliminate wrong answers

Option A is wrong because Azure Data Lake Storage Gen2 is a hierarchical file system for storing raw data, not a query engine; it lacks the MPP architecture and indexing needed for fast complex aggregations over petabytes. Option B is wrong because Azure SQL Database is a single-node OLTP database that cannot scale to petabytes or handle the massive parallel scans required for time-series analytics. Option D is wrong because Azure Cosmos DB is a NoSQL database optimized for low-latency point reads and writes, not for large-scale analytical queries or complex aggregations over petabytes of data.

Practice this question →

45

MCQhard

A company runs a critical workload in Azure Synapse Analytics. They need to ensure that if a single node fails, the data in the control node and compute nodes is not lost. Which configuration should they use?

A.Deploy a SQL Server Always On availability group.

B.No additional configuration is needed; Synapse provides built-in fault tolerance.

C.Configure active geo-replication.

D.Use a Windows Server Failover Cluster.

AnswerB

Synapse automatically replicates data and control node.

Why this answer

Azure Synapse Analytics (formerly SQL DW) is a distributed MPP (Massively Parallel Processing) system that automatically replicates data across multiple internal copies within the control node and compute nodes. This built-in fault tolerance ensures that if a single node fails, data is not lost because Synapse maintains at least three synchronous replicas of all data and metadata. No additional configuration is required for node-level failure protection.

Exam trap

The trap here is that candidates confuse the need for high availability with disaster recovery, assuming that because Synapse is a distributed system, they must manually configure clustering or replication, when in fact Synapse provides built-in fault tolerance at the node level as a core feature of the service.

How to eliminate wrong answers

Option A is wrong because SQL Server Always On availability groups are designed for traditional SQL Server instances, not for Azure Synapse Analytics, which is a PaaS service with its own internal high-availability mechanisms. Option C is wrong because active geo-replication is a disaster recovery feature for Azure SQL Database that replicates data to a different Azure region, not a solution for single-node failure within the same Synapse workspace. Option D is wrong because Windows Server Failover Cluster is an on-premises clustering technology for SQL Server and other applications, and it cannot be applied to Azure Synapse Analytics, which is a fully managed cloud service.

Practice this question →

46

MCQmedium

A data engineering team needs to transform raw clickstream data stored as Parquet files in Azure Data Lake Storage Gen2. They want to use standard T-SQL queries to perform transformations and aggregations. The team prefers a serverless option to avoid provisioning and managing dedicated compute resources. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Synapse Dedicated SQL pool

C.Azure Databricks

D.Azure HDInsight

AnswerA

This serverless option enables querying data lake files with T-SQL on-demand, without provisioning compute resources, aligning with the team's requirements.

Why this answer

Azure Synapse Serverless SQL pool is correct because it allows querying Parquet files in Azure Data Lake Storage Gen2 using standard T-SQL without provisioning any dedicated compute resources. It automatically scales compute based on query demand, making it ideal for ad-hoc transformations and aggregations on raw data with a serverless, pay-per-query model.

Exam trap

The trap here is that candidates may confuse 'serverless' with any cloud service that can run SQL, but only Azure Synapse Serverless SQL pool provides T-SQL support without provisioning compute, while Databricks and HDInsight require cluster management and use non-T-SQL query languages.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Dedicated SQL pool requires provisioning and managing dedicated compute resources, which contradicts the team's preference for a serverless option. Option C is wrong because Azure Databricks uses Apache Spark (not T-SQL) and requires a cluster to be provisioned, even with auto-scaling, and does not natively support T-SQL queries. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark service that requires provisioning clusters and does not support T-SQL; it uses Hive, Pig, or Spark SQL instead.

Practice this question →

47

MCQmedium

Your company has a data pipeline in Azure Data Factory that runs daily. Recently, the pipeline started failing with timeouts. You suspect a downstream database is slow. What should you do to monitor and alert on pipeline run duration?

A.Configure alerts in Azure Monitor based on pipeline run duration

B.Query the pipeline runs in Log Analytics

C.Check Azure Service Health for issues

D.Use Azure Advisor to check performance recommendations

AnswerA

Azure Monitor collects metrics and can trigger alerts when thresholds are exceeded.

Why this answer

Option A is correct because Azure Monitor can be configured to create alerts based on metrics such as pipeline run duration. When the duration exceeds a threshold, an alert triggers, allowing proactive notification of slow downstream databases. This directly addresses the need to monitor and alert on pipeline performance issues.

Exam trap

The trap here is that candidates may confuse reactive troubleshooting tools (like Log Analytics queries) with proactive monitoring and alerting capabilities, or mistakenly think Azure Service Health or Advisor are designed for pipeline-specific performance alerts.

How to eliminate wrong answers

Option B is wrong because querying pipeline runs in Log Analytics provides historical data for analysis but does not set up proactive alerts; it is a reactive troubleshooting step, not a monitoring and alerting solution. Option C is wrong because Azure Service Health reports on Azure platform outages and service issues, not on the performance or duration of specific data factory pipelines. Option D is wrong because Azure Advisor provides recommendations for optimizing performance and costs, but it does not monitor real-time pipeline run duration or trigger alerts based on thresholds.

Practice this question →

48

Drag & Dropmedium

Drag and drop the steps to create an Azure Stream Analytics job in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating a Stream Analytics job involves setting up the job parameters, then configuring inputs, outputs, and the transformation query.

Practice this question →

49

MCQeasy

A company stores weather sensor data in Azure Data Lake Storage Gen2. Data scientists need to run large-scale transformations and machine learning experiments on this data using Python and Apache Spark. They want to collaborate using shared Jupyter notebooks. Which Azure service should they use for this analytical workload?

A.Azure Stream Analytics

B.Azure Synapse Analytics (with Apache Spark pools)

C.Azure Analysis Services

D.Azure SQL Database

AnswerB

Correct. Azure Synapse Analytics provides Apache Spark pools integrated with Jupyter notebooks, enabling data scientists to run Python/Spark jobs on data stored in ADLS Gen2 for transformations and ML.

Why this answer

Azure Synapse Analytics with Apache Spark pools provides a fully managed Spark environment that supports Python and allows data scientists to run large-scale transformations and machine learning experiments. It integrates directly with Azure Data Lake Storage Gen2 for reading and writing data, and supports collaborative Jupyter notebooks for shared development. This makes it the correct choice for the described analytical workload.

Exam trap

The trap here is that candidates may confuse Azure Stream Analytics (a real-time processing service) with batch processing, or think Azure SQL Database can handle large-scale Spark workloads, when in fact only Synapse Analytics with Spark pools provides the required Python, Spark, and collaborative notebook capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service for analyzing data in motion, not designed for large-scale batch transformations or collaborative Jupyter notebook-based machine learning experiments. Option C is wrong because Azure Analysis Services is an enterprise-grade semantic model engine for creating tabular models and performing business analytics, not a platform for running Python or Apache Spark code. Option D is wrong because Azure SQL Database is a relational database service for transactional workloads and does not natively support Apache Spark or Jupyter notebooks for large-scale data transformations and ML experiments.

Practice this question →

50

MCQhard

Refer to the exhibit. An administrator is configuring aggregations in Power BI Premium to improve performance on a large dataset. The aggregation is defined on the Sales table with SUM(Amount) grouped by ProductCategory, Region, and Date at the monthly level. However, some reports that query daily data are still slow. What is the most likely reason?

A.The dataset is too large for aggregations

B.The aggregation is not in DirectQuery mode

C.The aggregation level is monthly, but queries need daily granularity

D.The aggregation has too many dimensions

AnswerC

Daily queries cannot use the monthly aggregation, so they hit the full dataset.

Why this answer

Option C is correct because the aggregation is defined at the monthly level (grouping by month), but the slow reports are querying daily data. Power BI aggregations work by pre-aggregating data at a specified granularity; when a query requests a lower level of detail (daily), the aggregation cannot satisfy the query, so Power BI falls back to scanning the full detailed dataset, causing slow performance. To improve daily queries, an additional aggregation at the daily level would be needed.

Exam trap

The trap here is that candidates may think aggregations automatically speed up all queries on a table, but they must match the exact granularity of the query; otherwise, the aggregation is ignored and the full dataset is scanned.

How to eliminate wrong answers

Option A is wrong because the dataset being large is the reason aggregations are used, and the issue is not the size itself but the mismatch between the aggregation granularity and the query granularity. Option B is wrong because the mode (DirectQuery vs. Import) is not relevant to the granularity mismatch; aggregations can be defined in either mode, and the problem is the monthly grouping, not the storage mode.

Option D is wrong because having multiple dimensions (ProductCategory, Region, Date) is typical and not a cause of slowness; the issue is the date granularity, not the number of dimensions.

Practice this question →

51

Matchingmedium

Match each data type to its category in Azure.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Relational tables with fixed schema

JSON, XML, or key-value pairs

Blobs, files, and media

Data in tables with relationships

NoSQL data like documents or graphs

Why these pairings

Understanding data types helps choose the right Azure service.

Practice this question →

52

MCQmedium

A data analyst needs to run complex SQL queries against petabytes of historical sales data stored in Azure Data Lake Storage Gen2. The solution must be serverless with pay-per-query pricing. Which Azure service should they use?

A.Azure Synapse Analytics serverless SQL pool

B.Azure SQL Database

C.Azure HDInsight with Spark

D.Azure Analysis Services

AnswerA

It provides serverless SQL queries over data in Data Lake Storage with pay-per-query pricing.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it provides a serverless, pay-per-query engine that can directly query petabytes of data stored in Azure Data Lake Storage Gen2 using standard T-SQL. It eliminates infrastructure management and charges only for the data processed by each query, making it ideal for ad-hoc, complex SQL workloads on massive historical datasets.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics serverless SQL pool with Azure SQL Database or HDInsight, mistakenly thinking that any SQL-capable service can handle petabyte-scale serverless queries, while the key differentiator is the direct, pay-per-query access to Data Lake Storage without provisioning compute.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a fully managed, provisioned relational database service that requires pre-allocated resources and does not support serverless pay-per-query pricing for petabyte-scale data in Data Lake Storage; it is designed for transactional workloads, not analytical queries on external data. Option C is wrong because Azure HDInsight with Spark is a cluster-based service that requires provisioning and managing compute nodes, incurring costs even when idle, and does not offer true serverless pay-per-query pricing. Option D is wrong because Azure Analysis Services is a fully managed PaaS service for semantic models and in-memory analytics, requiring provisioned resources and not designed for direct serverless SQL queries against Data Lake Storage; it also lacks pay-per-query billing.

Practice this question →

53

MCQhard

Your company uses Azure Databricks to process streaming data from Event Hubs. The data is transformed and written to Azure Data Lake Storage Gen2 as Delta tables. You notice that some records are duplicated in the Delta tables. Which configuration change should you make to prevent duplicates?

A.Add a separate job to deduplicate the Delta table.

B.Enable checkpointing in the streaming query to store progress.

C.Use Delta Lake's idempotent write support.

D.Increase the batch interval in the streaming query.

AnswerB

Checkpointing ensures exactly-once processing.

Why this answer

Checkpointing in Spark Structured Streaming stores the offset of the last processed event from Event Hubs. When the query restarts, it reads from the checkpointed offset, ensuring each event is processed exactly once and preventing duplicates in the Delta table.

Exam trap

The trap here is that candidates confuse idempotent writes (which prevent duplicate writes within a single transaction) with checkpointing (which prevents duplicate reads across query restarts), leading them to choose Option C instead of B.

How to eliminate wrong answers

Option A is wrong because adding a separate deduplication job is an extra step that does not address the root cause of duplicate ingestion; it only cleans up after the fact, increasing complexity and cost. Option C is wrong because Delta Lake's idempotent write support prevents duplicate writes within the same transaction, but it does not handle duplicate reads from the streaming source; the duplication occurs because the streaming query reprocesses events from the beginning without checkpointing. Option D is wrong because increasing the batch interval only changes how often micro-batches are triggered; it does not track which events have already been processed, so duplicates can still occur on restarts.

Practice this question →

54

MCQmedium

A healthcare organization must build an analytics solution that processes streaming patient vitals data and provides real-time dashboards. The solution must also store historical data for compliance audits. Which combination of Azure services should the organization use?

A.Azure Stream Analytics for real-time processing and Azure SQL Database for historical storage and dashboards.

B.Azure Synapse Analytics for real-time processing and Azure Blob Storage for archival.

C.Azure Event Hubs for ingestion and Azure Data Lake Storage for storage, with Power BI for dashboards.

D.Azure HDInsight with Apache Spark for streaming and Azure Cosmos DB for storage.

AnswerA

Stream Analytics processes streaming data; SQL Database stores structured historical data for compliance and queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time processing of streaming data, such as patient vitals, and can output directly to Power BI for live dashboards. Azure SQL Database provides a relational store for historical data, supporting compliance audits with point-in-time restore and long-term retention. This combination meets both real-time and historical requirements without unnecessary complexity.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics as a streaming service due to its 'analytics' name, but it is primarily a data warehouse for batch and interactive queries, not for real-time stream processing.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is not designed for real-time stream processing; it is a data warehouse and analytics service for batch and interactive queries, not for low-latency streaming. Option C is wrong because while Azure Event Hubs and Azure Data Lake Storage are suitable for ingestion and storage, they lack built-in real-time processing; Power BI alone cannot process streaming data without a compute layer like Stream Analytics. Option D is wrong because Azure HDInsight with Apache Spark is a big data platform that can handle streaming, but it adds operational overhead and is not as straightforward for real-time dashboards as Stream Analytics; Azure Cosmos DB is a NoSQL database, not optimized for relational compliance audits.

Practice this question →

55

MCQhard

Refer to the exhibit. You have an Azure Data Factory pipeline definition as shown. The pipeline fails with a 'Source not found' error. The BlobInputDataset points to a container that exists. What is the most likely cause?

A.The Azure Blob Storage container is empty.

B.The Azure Data Factory managed identity does not have access to the storage account.

C.The SQL sink database does not exist.

D.The dataset's file path is incorrect or no files match the pattern.

AnswerD

Source not found typically means the file path is invalid or no files exist at that location.

Why this answer

The 'Source not found' error in Azure Data Factory indicates that the source dataset cannot locate the specified file or blob. Since the container exists, the most likely cause is that the file path defined in the dataset is incorrect or that no files match the specified pattern (e.g., wildcard or prefix). This is a common configuration issue when the dataset's folder path or file name does not correspond to the actual blob location.

Exam trap

The trap here is that candidates often confuse a missing file or incorrect path with an empty container or permission issues, but the specific 'Source not found' error points directly to the dataset's file path or pattern mismatch.

How to eliminate wrong answers

Option A is wrong because an empty container would not cause a 'Source not found' error; instead, a copy activity would succeed with zero rows copied, or a lookup activity would return an empty result. Option B is wrong because a managed identity access issue would result in an 'Authentication failed' or 'Authorization failed' error, not 'Source not found'. Option C is wrong because the SQL sink database not existing would cause a 'Sink not found' or connection error, not a source-related error.

Practice this question →

56

MCQeasy

A data engineer needs to transform and clean data from multiple sources before loading it into Azure Synapse Analytics. Which Azure service should they use for this ETL process?

A.Azure Analysis Services

B.Azure Databricks

C.Azure Data Factory

D.Azure Stream Analytics

AnswerC

Azure Data Factory provides visual data flows and orchestration for ETL workloads.

Why this answer

Azure Data Factory is the correct service because it is a cloud-based ETL (Extract, Transform, Load) service designed specifically to orchestrate and automate data movement and transformation from multiple sources. It provides built-in connectors for various data stores and supports data flows for cleaning and transforming data before loading it into Azure Synapse Analytics.

Exam trap

The trap here is that candidates often confuse Azure Databricks (a Spark-based analytics platform) with Azure Data Factory, but Data Factory is the dedicated ETL orchestration service, while Databricks is more for data engineering and machine learning workloads.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an analytical engine used for creating semantic models and performing business intelligence (BI) queries, not for ETL processes. Option B is wrong because Azure Databricks is a big data analytics platform based on Apache Spark, which can perform transformations but is not primarily an ETL orchestration service; it is more suited for advanced analytics and machine learning workloads. Option D is wrong because Azure Stream Analytics is a real-time event processing engine for streaming data, not designed for batch ETL from multiple static sources.

Practice this question →

57

MCQhard

A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?

A.Change distribution to replicated table

B.Change distribution to round-robin

C.Create a columnstore index

D.Change distribution to hash on 'WarehouseID'

AnswerD

Hash-distributing on WarehouseID places all rows for a specific warehouse on the same compute node. Queries that filter by WarehouseID become single-distribution queries, eliminating expensive data shuffles. This directly addresses the observed performance bottleneck.

Why this answer

D is correct because hash-distributing the 'Shipments' table on 'WarehouseID' ensures that all rows for a given warehouse are co-located on the same distribution node. This eliminates the need for data movement (shuffle) when queries filter on 'WarehouseID' and aggregate by 'Region', as the aggregation can be performed locally on each distribution without redistributing data across nodes.

Exam trap

The trap here is that candidates often confuse indexing (columnstore) with distribution design, assuming that a better index alone can fix shuffle-related performance issues, when in fact the distribution key is the primary factor determining data movement in a massively parallel processing (MPP) architecture.

How to eliminate wrong answers

Option A is wrong because replicated tables are suitable for small dimension tables (typically < 2 GB) and not for a 10 TB fact table like 'Shipments'; replicating such a large table would cause excessive storage overhead and degrade performance. Option B is wrong because round-robin distribution distributes data evenly without any logical grouping, so queries filtering on 'WarehouseID' would still require a full data shuffle to bring related rows together for aggregation. Option C is wrong because columnstore indexes are already the default for dedicated SQL pool tables and are designed for compression and scan performance, but they do not address the root cause of data movement across distributions caused by an inappropriate distribution key.

Practice this question →

58

MCQmedium

A manufacturing company needs to build an analytics solution for IoT sensor data. Thousands of devices send real-time temperature and vibration readings. The solution must: (1) ingest the streaming data reliably, (2) perform real-time aggregations (e.g., average temperature per device every minute), and (3) store the aggregated results in Azure Synapse Analytics for historical reporting and dashboards. Which combination of Azure services should be used?

A.Azure Event Hubs -> Azure Stream Analytics -> Azure Synapse Analytics

B.Azure IoT Hub -> Azure Data Factory -> Azure Cosmos DB

C.Azure Blob Storage -> Azure Databricks -> Azure SQL Database

D.Azure Service Bus -> Azure Functions -> Azure Table Storage

AnswerA

Correct. Event Hubs ingests the streaming data, Stream Analytics performs real-time transformations and aggregations, and Synapse Analytics stores the results for historical analysis and BI reporting.

Why this answer

Azure Event Hubs is designed for high-throughput, reliable ingestion of streaming data from millions of IoT devices. Azure Stream Analytics can then perform real-time aggregations (like average temperature per device per minute) using a SQL-like query language. Finally, Azure Synapse Analytics provides a dedicated SQL pool or serverless SQL endpoint for storing and querying the aggregated results, enabling historical reporting and dashboards.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub with Azure Event Hubs, thinking IoT Hub is required for all IoT scenarios, but Event Hubs is the correct choice for pure telemetry ingestion without device management needs.

How to eliminate wrong answers

Option B is wrong because Azure IoT Hub is primarily for device management and bi-directional communication, not optimized for high-scale streaming ingestion, and Azure Data Factory is a batch ETL tool, not a real-time stream processor; Azure Cosmos DB is a NoSQL database, not a data warehouse for historical reporting. Option C is wrong because Azure Blob Storage is for static file storage, not real-time streaming ingestion, and Azure Databricks is a big data analytics platform that can process streams but is not the simplest or most cost-effective choice for simple real-time aggregations; Azure SQL Database is a transactional database, not a large-scale analytics warehouse. Option D is wrong because Azure Service Bus is a message broker for enterprise messaging, not designed for high-throughput IoT telemetry, and Azure Functions is a serverless compute service that would require custom code for stream processing, lacking the built-in windowing and aggregation capabilities of Stream Analytics; Azure Table Storage is a NoSQL key-value store, not suitable for complex analytical queries.

Practice this question →

59

MCQmedium

A data engineer needs to build an analytics solution to transform large volumes of streaming data from IoT devices. The transformations involve complex Python and Spark code, and the results will be stored in Azure Data Lake Storage Gen2 for further analysis. Which Azure service is best suited for executing these transformations?

A.Azure Data Factory

B.Azure Synapse Pipelines

C.Azure Databricks

D.Azure Analysis Services

AnswerC

Correct. Azure Databricks is a Spark-based analytics platform that allows complex transformations using Python, Scala, or SQL, and integrates well with Data Lake Storage.

Why this answer

Azure Databricks is best suited because it provides an Apache Spark-based analytics platform that can execute complex Python and Spark code on large-scale streaming data. It integrates natively with Azure Data Lake Storage Gen2 for reading streaming IoT data and writing transformed results, offering optimized performance for big data transformations.

Exam trap

The trap here is that candidates confuse Azure Data Factory or Synapse Pipelines with compute engines for code-based transformations, when those services are primarily for orchestration and integration, not for executing complex Python/Spark code on streaming data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an orchestration and ETL/ELT service that uses code-free pipelines or SQL-based transformations, not designed for executing complex Python and Spark code on streaming data. Option B is wrong because Azure Synapse Pipelines (now part of Synapse Analytics) focuses on data integration and orchestration with T-SQL or Spark notebooks, but it lacks the dedicated streaming and collaborative notebook environment that Azure Databricks provides for complex Spark transformations. Option D is wrong because Azure Analysis Services is a semantic modeling and business intelligence service for creating tabular models, not a compute engine for running Python or Spark code on streaming data.

Practice this question →

60

MCQmedium

Your organization uses Azure Purview to scan data sources. You need to set up a scan rule set that automatically classifies credit card numbers in Azure SQL Database. Which built-in classification rule should you enable?

A.Use a regular expression pattern matching.

B.Create a custom classification rule.

C.Enable the 'Personally Identifiable Information (PII)' classification.

D.Enable the 'Credit Card Number' classification.

AnswerD

Built-in rule detects credit card patterns.

Why this answer

Option D is correct because Azure Purview includes a built-in 'Credit Card Number' classification rule that uses a predefined regular expression pattern to detect credit card numbers in data sources like Azure SQL Database. Enabling this rule automatically classifies the data without requiring custom development, aligning with the requirement to use a built-in classification.

Exam trap

The trap here is that candidates may confuse the method (regular expression pattern matching) with the specific built-in rule, or incorrectly assume that enabling a broader PII classification is sufficient when the question requires a targeted credit card number classification.

How to eliminate wrong answers

Option A is wrong because 'Use a regular expression pattern matching' is not a built-in classification rule in Azure Purview; it describes a method for creating custom rules, not a specific rule to enable. Option B is wrong because 'Create a custom classification rule' is unnecessary when a built-in rule for credit card numbers exists, and the question explicitly asks for a built-in rule to enable. Option C is wrong because 'Enable the 'Personally Identifiable Information (PII)' classification' is a broader category that may include credit card numbers but does not specifically target them; enabling it would classify all PII types, not just credit card numbers, which is not the precise requirement.

Practice this question →

61

MCQhard

You are the data engineer for a large retail company. The company has an existing on-premises SQL Server database with 10 years of transactional data. They want to move this data to Azure to enable advanced analytics using Azure Synapse Analytics. The data includes customer orders, product details, and inventory. The solution must minimize data movement and support both batch and real-time analytics. The company also wants to use Power BI for reporting. They have a limited budget and prefer a serverless option for compute. You are evaluating the following approaches: A) Use Azure Data Factory to copy all data to Azure Data Lake Storage Gen2, then use Azure Synapse Serverless SQL pool to query the data, and finally connect Power BI to the serverless SQL endpoint. B) Use Azure Database Migration Service to migrate the SQL Server database to Azure SQL Database, then use Azure Synapse Analytics with a dedicated SQL pool to perform analytics, and connect Power BI to the dedicated pool. C) Use Azure Data Factory to copy all data to Azure Blob Storage, then use Azure Stream Analytics to perform real-time analytics, and connect Power BI directly to Stream Analytics output. D) Use Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool for batch analytics, and use Azure Event Hubs and Stream Analytics for real-time data, with Power BI connecting to both serverless SQL and Stream Analytics. Which approach best meets the requirements?

A.Option A

B.Option C

C.Option B

D.Option D

AnswerD

Combines serverless batch and real-time, minimizes data movement, and uses Power BI.

Why this answer

Option D best meets the requirements because it uses Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, enabling cost-effective storage and batch analytics via Azure Synapse Serverless SQL pool (serverless compute). It also incorporates Azure Event Hubs and Stream Analytics for real-time data ingestion and analytics, with Power BI connecting to both the serverless SQL endpoint and Stream Analytics output. This minimizes data movement by keeping data in the lake, supports both batch and real-time analytics, and uses a serverless option to stay within a limited budget.

Exam trap

The trap here is that candidates often choose Option A because it uses serverless SQL and Power BI, but they overlook the explicit requirement for real-time analytics, which Option A does not address.

How to eliminate wrong answers

Option A is wrong because it only supports batch analytics via the serverless SQL pool and lacks a real-time analytics component, failing the requirement for real-time analytics. Option B is wrong because it uses Azure SQL Database and a dedicated SQL pool, which are provisioned (not serverless) compute options, increasing costs and violating the preference for a serverless option; it also moves data to a separate database, increasing data movement. Option C is wrong because it copies data to Azure Blob Storage (which lacks the hierarchical namespace and optimized analytics features of Data Lake Storage Gen2) and uses only Stream Analytics for real-time analytics, missing the batch analytics requirement and the serverless SQL pool for ad-hoc querying.

Practice this question →

62

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every day, they need to incrementally load 100 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The load should use PolyBase for efficient parallel data transfer and must be orchestrated on a recurring schedule. Which Azure service should they use to create and manage this pipeline?

A.A: Azure Data Factory

B.B: Azure Stream Analytics

C.C: Azure Databricks

D.D: Azure Logic Apps

AnswerA

Azure Data Factory supports scheduled pipelines, PolyBase integration for high-speed loading into Synapse, and incremental copy patterns, making it the ideal orchestration tool.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides native orchestration and scheduling capabilities for data pipelines. It supports PolyBase as a sink to load data into Azure Synapse dedicated SQL pool in parallel, and it can directly read CSV files from ADLS Gen2. ADF's built-in triggers allow you to schedule the daily incremental load without additional coding.

Exam trap

The trap here is that candidates may confuse Azure Databricks as a pipeline orchestrator, but it lacks native scheduling and PolyBase integration, whereas Azure Data Factory is the dedicated service for building and managing data pipelines with PolyBase support.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is designed for real-time stream processing (e.g., from Event Hubs or IoT Hub), not for scheduled batch loading of CSV files from ADLS Gen2. Option C (Azure Databricks) is wrong because while it can process data and load into Synapse, it is a Spark-based analytics platform that requires manual pipeline orchestration or integration with ADF; it does not natively provide the simple scheduling and PolyBase integration that ADF offers out of the box. Option D (Azure Logic Apps) is wrong because it is a low-code workflow service for integrating SaaS applications and APIs, not designed for high-throughput data movement or PolyBase-based parallel loading into a dedicated SQL pool.

Practice this question →

63

MCQmedium

Your company has a Power BI dashboard that uses a data model with a single large fact table and several dimension tables. The dashboard loads slowly when users filter by multiple dimensions. Which design change would MOST improve performance?

A.Use page-level filters instead of report-level filters.

B.Create a calculated table that aggregates the fact table at a higher granularity.

C.Ensure the fact table and dimension tables follow a star schema design with proper relationships.

D.Convert the data model to a composite model using DirectQuery for some tables.

AnswerC

Star schema is optimized for analytical queries and filtering.

Why this answer

Option C is correct because a star schema design with proper relationships between the fact table and dimension tables is the foundational best practice for optimizing Power BI data models. This design minimizes the cardinality of relationships, reduces the size of the data model, and enables efficient query folding and storage engine compression, which directly improves filter performance across multiple dimensions.

Exam trap

The trap here is that candidates often confuse UI-level filter scoping (page-level vs. report-level) with actual query performance optimization, or they mistakenly believe that aggregating data or switching to DirectQuery will always improve speed, ignoring the fundamental importance of star schema design for in-memory analytics.

How to eliminate wrong answers

Option A is wrong because page-level filters do not improve query performance; they only change the scope of filter application in the UI, and the underlying query still hits the same large fact table. Option B is wrong because creating a calculated table that aggregates the fact table at a higher granularity would lose detail data and prevent users from drilling down, which is not a performance optimization but a data reduction that changes the analytical capability. Option D is wrong because converting to a composite model with DirectQuery for some tables often introduces latency from the source system and can degrade performance due to the lack of in-memory compression and the need for cross-engine joins, making it a poor choice for improving dashboard responsiveness.

Practice this question →

64

MCQhard

A manufacturing company ingests a continuous stream of sensor data from thousands of IoT devices into Azure Event Hubs. The company also stores historical equipment maintenance records in Azure SQL Database. The operations team needs to join the streaming sensor data with the historical maintenance records in near real-time to detect anomalies, and data scientists need to run ad-hoc T-SQL queries on the combined dataset for analysis. Which Azure service should they use as the primary analytics platform to meet both requirements?

A.Azure Stream Analytics

B.Azure Databricks

C.Azure Synapse Analytics

D.Azure Analysis Services

AnswerC

Azure Synapse Analytics provides both real-time ingestion and a T-SQL-based query engine (SQL pool or serverless) for ad-hoc analysis, meeting both requirements.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest streaming data from Azure Event Hubs via its built-in Spark pools or pipelines, and simultaneously query historical data in Azure SQL Database using T-SQL. This enables near real-time anomaly detection through streaming joins and ad-hoc T-SQL queries for data scientists, all within a single service without needing separate tools.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as sufficient for both requirements, overlooking its lack of ad-hoc T-SQL query support, and mistakenly think Azure Databricks supports T-SQL natively when it actually uses Spark SQL or Python.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that can join streaming data with reference data (e.g., from SQL Database), but it does not support ad-hoc T-SQL queries for data scientists; it uses a SQL-like query language for continuous queries, not interactive T-SQL. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform that can handle streaming and batch workloads, but it does not natively support T-SQL queries; it uses Spark SQL or Python, not the T-SQL dialect required for ad-hoc queries by data scientists. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and business intelligence, not designed for real-time stream processing or direct T-SQL queries on raw combined datasets; it focuses on pre-aggregated data and MDX/DAX queries.

Practice this question →

65

MCQmedium

A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics. The data volume is approximately 2 TB and the network bandwidth is limited. Which approach minimizes data transfer time?

A.Use SQL Server Integration Services (SSIS) to transfer data over the internet.

B.Use Azure Data Box to physically ship the data.

C.Establish a site-to-site VPN and use Azure Data Factory.

D.Use Azure Data Factory with a self-hosted integration runtime over the internet.

AnswerB

Data Box bypasses network constraints for large datasets.

Why this answer

Azure Data Box is the correct approach because it physically ships the 2 TB of data on a secure storage device, bypassing the limited network bandwidth entirely. For large data volumes (multiple TB) with constrained connectivity, offline data transfer is significantly faster than any online method, as it avoids network latency and bandwidth bottlenecks.

Exam trap

The trap here is that candidates often assume online transfer tools like Azure Data Factory or SSIS are always optimal, but for large data volumes with limited bandwidth, offline shipping via Azure Data Box is the only practical solution to minimize transfer time.

How to eliminate wrong answers

Option A is wrong because SSIS over the internet would be severely throttled by the limited network bandwidth, making the transfer of 2 TB extremely slow and impractical. Option C is wrong because a site-to-site VPN still relies on the same limited internet bandwidth, so using Azure Data Factory over it would not reduce transfer time. Option D is wrong because Azure Data Factory with a self-hosted integration runtime over the internet still depends on the available network bandwidth, which is insufficient for a 2 TB transfer in a timely manner.

Practice this question →

66

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table containing 5 TB of sales transactions. New data arrives continuously and is loaded daily. The company needs to load 500 GB of new data each day while allowing concurrent read queries on the most recent data without performance degradation. Which loading strategy optimizes both load speed and query performance?

A.Use INSERT statements to add rows incrementally

B.Use CREATE TABLE AS SELECT (CTAS) to build a new table and rename it

C.Load data into a staging table, then use partition switching to swap the latest partition

D.Use PolyBase to load data directly into the fact table

AnswerC

Partition switching is a fast, metadata-only operation that adds new data without blocking concurrent reads, ideal for large daily loads.

Why this answer

Option C is correct because partition switching allows you to load new data into a staging table, then instantly swap the staging partition with the target table's latest partition using ALTER TABLE SWITCH. This minimizes metadata changes and avoids data movement, enabling fast loads while keeping the fact table online for concurrent read queries without blocking or performance degradation.

Exam trap

The trap here is that candidates often choose PolyBase (Option D) because it is associated with fast data loading, but they overlook that direct loading into a large fact table causes fragmentation and locking, whereas partition switching provides both speed and query isolation.

How to eliminate wrong answers

Option A is wrong because INSERT statements for 500 GB of data would generate excessive transaction log overhead, cause locking and blocking, and degrade concurrent read query performance on the dedicated SQL pool. Option B is wrong because CTAS creates a full copy of the entire 5 TB table plus the new data, which is resource-intensive, slow, and unnecessary for daily incremental loads; it also requires renaming and dropping the old table, causing downtime. Option D is wrong because PolyBase loads data directly into the fact table, which can cause fragmentation, locking, and poor query performance during the load, and it does not isolate the new data for efficient partition management.

Practice this question →

67

MCQhard

A manufacturing company collects sensor data from thousands of IoT devices. The data arrives as a stream of time-stamped readings with a fixed schema (DeviceID, Timestamp, Temperature, Pressure, Vibration). They need to store this data and support both real-time dashboards showing the last hour of data and complex analytical queries over years of historical data. The solution must minimize storage costs and provide sub-second response for real-time queries. Which Azure service is best suited for this workload?

A.Azure Cosmos DB with SQL API

B.Azure SQL Database

C.Azure Data Explorer

D.Azure Table Storage

AnswerC

Azure Data Explorer is specifically built for time-series and log analytics. It supports high-throughput ingestion, automatic indexing, caching for hot data (sub-second queries), and retention-based tiering to cold storage for historical analysis, minimizing costs.

Why this answer

Azure Data Explorer (ADX) is purpose-built for high-performance analysis of large volumes of streaming telemetry data. It supports ingestion from IoT hubs, automatic indexing for sub-second queries on recent data (e.g., last hour), and cost-effective long-term storage via hot/cold tiering for years of historical analytics. Its columnar storage and Kusto Query Language (KQL) are optimized for time-series and aggregation queries, making it ideal for this mixed real-time and historical workload.

Exam trap

Microsoft often tests the misconception that any database with low-latency reads (like Cosmos DB) can handle both real-time and historical analytics, but the trap is that Cosmos DB lacks the columnar storage and query engine optimized for time-series aggregations, making it cost-prohibitive and slow for complex analytical queries over years of data.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB with SQL API is a NoSQL document database optimized for transactional workloads with low-latency reads/writes, but it is not designed for complex analytical queries over years of historical data and its storage costs are significantly higher than ADX for large telemetry volumes. Option B is wrong because Azure SQL Database is a relational OLTP engine that provides strong consistency and indexing, but it struggles with sub-second response on streaming time-series data at scale and its storage costs are higher for high-ingestion-rate telemetry. Option D is wrong because Azure Table Storage is a simple key-value store with no native support for time-series analytics, complex aggregations, or sub-second query performance on streaming data, and it lacks indexing for efficient range queries over timestamps.

Practice this question →

68

MCQmedium

Refer to the exhibit. You are reviewing an ARM template that deploys a SQL database in Azure Synapse. The template sets the storageAccountType to GRS. What is a valid concern regarding cost and performance?

A.GRS will increase storage costs and may cause higher latency

B.The collation setting is not compatible with Azure Synapse

C.The database cannot be part of a failover group

D.The database will not support Transparent Data Encryption

AnswerA

GRS replicates data to a secondary region, increasing cost and potentially write latency.

Why this answer

Geo-redundant storage (GRS) replicates your data to a secondary region, which increases storage costs because you are paying for both the primary and secondary copies. Additionally, when using GRS with Azure Synapse SQL, read requests may experience higher latency if they are directed to the secondary region, especially during a failover scenario or when using read-access geo-redundant storage (RA-GRS). This makes cost and performance valid concerns when choosing GRS over locally redundant storage (LRS).

Exam trap

The trap here is that candidates often assume GRS only affects disaster recovery and ignore its impact on ongoing storage costs and read latency, leading them to dismiss cost and performance as valid concerns.

How to eliminate wrong answers

Option B is wrong because the collation setting is not inherently incompatible with Azure Synapse; Synapse SQL pools support a variety of collations, and the default SQL_Latin1_General_CP1_CI_AS is commonly used. Option C is wrong because Azure Synapse SQL databases can be part of a failover group when configured appropriately, though the failover group feature is more commonly associated with Azure SQL Database; the GRS setting does not prevent failover group membership. Option D is wrong because Transparent Data Encryption (TDE) is supported in Azure Synapse SQL pools regardless of the storage replication type (GRS, LRS, etc.), as TDE operates at the database level and is independent of storage redundancy.

Practice this question →

69

MCQhard

A financial services company has raw transaction data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files, partitioned by date. The analytics team needs to run complex SQL queries that join multiple datasets, including reference data from an Azure SQL Database, to generate risk reports. They require enterprise-grade security features such as row-level security (RLS) and column-level security. They also want to use the same service for data transformation and loading (ETL) into a curated layer. Which Azure service should they choose?

A.Azure Synapse Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Analysis Services

AnswerA

Correct. Azure Synapse Analytics offers a unified experience for data integration, enterprise data warehousing, and big data analytics, with built-in security features like RLS and column-level security. It can query ADLS Gen2 using serverless SQL pool and orchestrate ETL with pipelines.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that combines enterprise data warehousing with big data analytics. It directly supports complex SQL queries across multiple datasets (including Parquet files in ADLS Gen2 and Azure SQL Database), offers built-in row-level security (RLS) and column-level security for enterprise-grade access control, and includes a built-in pipeline orchestration engine (via Synapse Pipelines) for ETL/ELT transformations into a curated layer. This single service eliminates the need to stitch together separate tools for querying, security, and data transformation.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as a complete analytics solution because of its ETL capabilities, overlooking that it lacks a native SQL query engine and built-in row/column-level security for direct data access.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is primarily a cloud-based ETL and data integration service that orchestrates data movement and transformation, but it does not provide a native SQL query engine for complex analytical queries or built-in row-level/column-level security on the data itself. Option C (Azure Databricks) is wrong because while it excels at big data processing and machine learning using Apache Spark, it does not natively support enterprise-grade row-level security (RLS) and column-level security at the storage or query layer without additional configuration, and its primary interface is not SQL-first for complex joins across relational and file-based sources. Option D (Azure Analysis Services) is wrong because it is a semantic modeling and BI engine that provides tabular models with RLS, but it is not designed for direct ETL/ELT data transformation or loading into a curated layer, nor does it directly query raw Parquet files in ADLS Gen2 without additional data ingestion steps.

Practice this question →

70

MCQmedium

A retail company uses Power BI to create sales reports. The data source is an Azure SQL Database that updates every 15 minutes. The reports must reflect near real-time data without manual refresh. Which Power BI feature should the company use?

A.Use the on-premises data gateway to connect to Azure SQL Database.

B.Import data with scheduled refresh every 15 minutes.

C.Use DirectQuery mode to connect to the Azure SQL Database.

D.Create a Power BI dataflow to transform the data.

AnswerC

DirectQuery sends queries to the source database, providing near real-time data.

Why this answer

DirectQuery mode allows Power BI to query the Azure SQL Database directly without importing data, ensuring that reports reflect the current state of the database each time a report is viewed. Since the database updates every 15 minutes, DirectQuery provides near real-time data without requiring manual or scheduled refresh operations.

Exam trap

The trap here is that candidates often confuse DirectQuery with scheduled refresh, assuming that a 15-minute refresh schedule is sufficient for near real-time needs, but DirectQuery eliminates the refresh interval entirely by querying the source live.

How to eliminate wrong answers

Option A is wrong because the on-premises data gateway is used to connect on-premises data sources to Power BI, but Azure SQL Database is a cloud service that can be accessed directly without a gateway. Option B is wrong because scheduled refresh imports data into the Power BI dataset, which introduces latency and requires manual configuration; even with a 15-minute schedule, the data is only as current as the last import, not near real-time. Option D is wrong because a Power BI dataflow is used for data preparation and transformation in the cloud, not for live querying; it still requires a separate import or DirectQuery connection to serve reports.

Practice this question →

71

MCQhard

A data engineering team is building a batch analytics pipeline. Raw clickstream data is stored as Parquet files in Azure Data Lake Storage Gen2. The team needs to transform the data using Apache Spark (Python code) and then load the results into Azure Synapse Analytics for high-performance reporting. They want to use a serverless compute option for Spark to avoid managing clusters. Which combination of Azure services should they use for the transformation and loading?

A.Use Azure Databricks with a serverless cluster for transformations and load into Azure SQL Database.

B.Use Azure Synapse Analytics serverless Spark pools for transformations and load into the Synapse dedicated SQL pool.

C.Use Azure Data Factory with a Spark activity to run transformations and load into Azure Synapse Analytics.

D.Use Azure HDInsight with Apache Spark for transformations and load into Azure Blob Storage.

AnswerB

Synapse Analytics provides serverless Spark pools that automatically scale and can read from ADLS Gen2. The transformed data can be loaded into the dedicated SQL pool for high-performance queries, all within a single integrated service.

Why this answer

Option B is correct because Azure Synapse Analytics serverless Spark pools provide a serverless compute option for running Apache Spark transformations without managing clusters, and the transformed data can be directly loaded into the Synapse dedicated SQL pool for high-performance reporting. This combination meets all requirements: serverless Spark for transformations, and Synapse dedicated SQL pool for optimized analytics workloads.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics serverless Spark pools (which are serverless) with Azure Data Factory's Spark activity (which requires a managed cluster), or assume that any Spark service (like HDInsight) can be serverless, when only Synapse serverless Spark pools and Databricks serverless clusters offer true serverless compute.

How to eliminate wrong answers

Option A is wrong because Azure Databricks with a serverless cluster is a valid serverless Spark option, but it loads into Azure SQL Database, not Azure Synapse Analytics, which does not provide the high-performance reporting capabilities of a dedicated SQL pool. Option C is wrong because Azure Data Factory with a Spark activity still requires a managed Spark cluster (e.g., HDInsight or Databricks) and does not offer a serverless Spark compute option; Data Factory orchestrates but does not run Spark natively in a serverless manner. Option D is wrong because Azure HDInsight requires explicit cluster management (not serverless) and loads into Azure Blob Storage, which is not a high-performance reporting target like Synapse dedicated SQL pool.

Practice this question →

72

MCQhard

A data analyst needs to create an interactive report that combines sales data from Azure SQL Database and Azure Cosmos DB. The report must refresh daily. Which tool should they use?

A.Azure Data Factory

B.Azure Synapse Studio

C.Azure Analysis Services

D.Power BI

AnswerD

Power BI can connect to multiple sources and create interactive dashboards with scheduled refresh.

Why this answer

Power BI is the correct tool because it is designed for creating interactive reports and dashboards, and it can directly connect to both Azure SQL Database and Azure Cosmos DB as data sources. Its scheduled refresh capability allows the report to refresh daily without manual intervention, meeting the requirement for an interactive, combined report.

Exam trap

The trap here is that candidates may confuse data integration tools (like Azure Data Factory) or data modeling services (like Azure Analysis Services) with the actual reporting and visualization tool, which is Power BI, the only option that directly creates interactive reports with scheduled refresh.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data integration service, not a reporting or visualization tool; it would be used to move or transform data before reporting, but not to create the interactive report itself. Option B is wrong because Azure Synapse Studio is an analytics workspace for big data and data warehousing, not a dedicated interactive reporting tool; while it can query data, it lacks the rich visualization and dashboard features of Power BI. Option C is wrong because Azure Analysis Services is a semantic modeling engine that provides analytical data models, but it does not create interactive reports; it would typically serve as a data source for Power BI, not replace it.

Practice this question →

73

MCQmedium

A marketing company collects real-time clickstream data from their website using Azure Event Hubs. They need to perform two tasks: (1) aggregate the number of clicks per advertising campaign every 5 minutes and display the results in a live dashboard, and (2) run complex historical queries on months of aggregated click data to identify trends. They want to minimize data movement and use serverless compute where possible. Which combination of Azure services should they use?

A.Azure Stream Analytics for live aggregation and Power BI for the dashboard; Azure Synapse Analytics (serverless SQL pool) for historical queries

B.Azure Data Factory for live aggregation; Azure Analysis Services for historical queries

C.Azure HDInsight (Spark) for both live and historical processing

D.Azure Functions for real-time aggregation; Azure SQL Database for historical queries

AnswerA

Correct. Stream Analytics processes streaming data in real-time with 5-minute tumbling windows and outputs to Power BI. Synapse serverless SQL pool can query Parquet files stored in the data lake for historical analysis.

Why this answer

Azure Stream Analytics is ideal for real-time aggregation of clickstream data from Event Hubs, outputting to Power BI for a live dashboard. Azure Synapse Analytics serverless SQL pool allows querying months of aggregated data stored in Azure Data Lake Storage without provisioning compute, minimizing data movement and using serverless compute.

Exam trap

The trap here is confusing batch processing tools like Azure Data Factory or HDInsight with real-time stream processing, and overlooking that Azure Synapse serverless SQL pool is the serverless option for historical queries, not Azure SQL Database.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory is an orchestration and ETL tool, not a real-time stream processing engine; it cannot perform live aggregation. Option C is wrong because HDInsight (Spark) requires provisioning and managing clusters, which does not minimize data movement or use serverless compute as efficiently as the chosen combination. Option D is wrong because Azure Functions are stateless and not designed for continuous real-time stream aggregation; Azure SQL Database is not serverless for historical queries and requires provisioning.

Practice this question →

74

MCQhard

A data analyst is using Azure Databricks to transform streaming data from Event Hubs. They need to ensure that if a failure occurs, the streaming job can resume processing from the last committed offset. Which checkpointing mechanism should they configure?

A.Use Structured Streaming with checkpointing to Azure Data Lake Storage Gen2.

B.Enable write-ahead logs on the Event Hubs namespace.

C.Use checkpointing to Hive metastore.

D.Use checkpointing to DBFS (Databricks File System).

AnswerA

Checkpointing to ADLS Gen2 provides fault tolerance.

Why this answer

Structured Streaming in Azure Databricks uses checkpointing to store the current state and offsets of the streaming query in a reliable, external storage system. By configuring checkpointing to Azure Data Lake Storage Gen2, the job can recover from failures and resume processing exactly from the last committed offset, ensuring exactly-once semantics. This is the recommended approach for production streaming workloads on Azure.

Exam trap

The trap here is that candidates often confuse DBFS with persistent storage, but DBFS is cluster-scoped and ephemeral, so checkpointing to DBFS will lose state when the cluster stops, whereas ADLS Gen2 provides durable, external checkpoint storage.

How to eliminate wrong answers

Option B is wrong because write-ahead logs on the Event Hubs namespace are used for Event Hubs internal durability and replication, not for checkpointing the consumer's offset state in Databricks. Option C is wrong because the Hive metastore is designed for storing table metadata and schema information, not for storing streaming checkpoint data or offsets. Option D is wrong because DBFS is a local ephemeral storage tied to the cluster; if the cluster is terminated or fails, checkpoint data stored in DBFS is lost, making it unsuitable for reliable failure recovery.

Practice this question →

75

MCQhard

A retail company has an Azure SQL Database that handles OLTP transactions for its e-commerce platform. The analytics team needs to run complex reporting queries that join multiple tables (e.g., orders, products, customers) and aggregate millions of rows. These queries are long-running and would negatively impact the performance of the OLTP database if run directly. The company wants to use a separate analytics service that supports T-SQL queries, can scale compute independently, and provides a serverless option to avoid provisioning fixed resources. Which Azure service should they choose?

A.Azure Synapse Analytics (dedicated SQL pool)

B.Azure Analysis Services

C.Azure Databricks

D.Azure SQL Database (creating a secondary replica)

AnswerA

Synapse Analytics provides a dedicated SQL pool with MPP architecture for complex queries on large datasets, supports T-SQL, and can be scaled independently. A serverless option is also available, but the dedicated pool is suited for consistent heavy workloads.

Why this answer

Azure Synapse Analytics (dedicated SQL pool) is the correct choice because it is a cloud-based analytics service that supports T-SQL queries, can scale compute independently from storage, and offers a serverless option (Synapse Serverless SQL pool) that eliminates the need to provision fixed resources. This allows the analytics team to run complex, long-running reporting queries against large datasets without impacting the performance of the OLTP Azure SQL Database.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (a semantic layer) with a T-SQL query engine, or assume that a secondary replica of Azure SQL Database can independently scale compute and handle heavy analytics workloads without performance impact.

How to eliminate wrong answers

Option B (Azure Analysis Services) is wrong because it is a semantic modeling service that uses tabular models and DAX/MDX queries, not T-SQL, and it does not support serverless compute or direct execution of complex T-SQL joins against raw data. Option C (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform that primarily uses Python, Scala, or SQL (Spark SQL), not native T-SQL, and it requires provisioning clusters even with auto-scaling, not a true serverless option for T-SQL workloads. Option D (Azure SQL Database creating a secondary replica) is wrong because while a readable secondary replica can offload read-only queries, it does not support independent compute scaling (it mirrors the primary's compute) and lacks a serverless option for the secondary; it also still uses the same underlying database engine, which may not handle massive aggregation workloads efficiently.

Practice this question →

Page 1 of 4 · 262 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Analytics Workload Azure questions.

Start 20-question session