Practice DP-900 Describe an analytics workload on Azure questions with full explanations on every answer.
Start practicing
Describe an analytics workload on Azure — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?
2A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?
3A data analytics team stores sales transaction data in Parquet files in Azure Data Lake Storage Gen2. They want to run complex analytical queries that join this data with dimension tables stored in Azure Synapse Analytics dedicated SQL pool. The team prefers not to move or copy the data from the data lake. Which feature should they use to query the data lake data directly?
4A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?
5A retail chain collects daily sales data from hundreds of stores. The data is stored as CSV files in Azure Data Lake Storage Gen2. The analytics team needs to run complex SQL queries that join sales data with product dimensions and aggregate results across petabytes of data. Queries must return results within seconds. Which Azure service is best suited for this analytical workload?
6A financial analytics company has petabytes of transaction data stored as Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple tables and return results within seconds. The company wants to query the data directly without moving it to another store. Which Azure service should they use?
7A retail company analyzes customer purchase patterns. Every night, they run a batch job that aggregates millions of transactions from the past day into summary tables for reporting. Which type of data processing workload best describes this nightly job?
8A retail chain captures real-time sales data from point-of-sale (POS) systems as a stream of events. The data is ingested into Azure Event Hubs. Additionally, the company receives daily inventory files in CSV format uploaded to Azure Data Lake Storage Gen2. The analytics team needs to combine the streaming sales data with the batch inventory data to generate near real-time dashboards and run historical reports. They want a single analytics platform that can handle both streaming and batch workloads, and allow querying data directly in the data lake using SQL. Which Azure service should they choose?
9A marketing team wants to analyze social media sentiment in near real-time. They will use Azure Event Hubs to capture tweets and need to aggregate sentiment scores over 5-minute windows. The aggregated results must be stored in Azure Blob Storage for later analysis. Which Azure service should they use to perform the stream processing?
10A company is migrating their on-premises data warehouse, which is built on a Netezza appliance, to Azure. The data warehouse contains over 10 terabytes of data and supports complex BI queries with multiple joins and aggregations. The company requires a cloud-based solution that provides massively parallel processing (MPP) to handle large-scale queries efficiently. They also need to integrate with existing ETL tools like Azure Data Factory and provide native connectivity to Power BI. Which Azure service should they choose?
11A manufacturing company collects sensor data from factory equipment as a continuous stream of events ingested into Azure Event Hubs. Additionally, the company receives daily inventory CSV files uploaded to Azure Data Lake Storage Gen2. The analytics team needs to build near real-time dashboards that combine streaming sensor data with batch inventory data, and also support historical reporting by querying data directly in the data lake using SQL without moving it. Which Azure service should they choose as the primary analytics platform?
12A retail company runs a nightly process that reads all sales transactions from the previous day, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload best describes this nightly process?
13A financial analytics company stores petabytes of transaction data in Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for ETL orchestration. They require a massively parallel processing (MPP) engine to handle the scale. Which Azure service should they choose?
14A retail company runs a nightly job that reads all sales transactions from the previous day from an operational database, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload does this nightly job represent?
15A financial services company stores years of market trade data as Parquet files in Azure Data Lake Storage Gen2. The data volume is terabytes and growing rapidly. Data analysts need to run complex SQL queries that join multiple tables (e.g., trades, instruments, counterparties) and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for orchestration of ETL pipelines. Which Azure service should they choose as the primary analytics platform?
16A logistics company receives real-time GPS tracking data from its delivery fleet via Azure Event Hubs. The data is a continuous stream of location updates (vehicle ID, latitude, longitude, timestamp). Additionally, the company has daily static route plan files in CSV format stored in Azure Data Lake Storage Gen2. The operations team needs to combine the live GPS stream with the route plans to create a near real-time dashboard showing if delivery vehicles are on schedule. They also want to run historical queries on both the stream data and route plans using T-SQL, without moving the data to another store. Which Azure service should they use as the primary analytics platform?
17A manufacturing company ingests a continuous stream of sensor data from factory equipment into Azure Event Hubs. Additionally, historical maintenance data in CSV format is stored in Azure Data Lake Storage Gen2. The analytics team needs to join the streaming sensor data with the historical data in near real-time and enable analysts to query the combined dataset using standard T-SQL without moving the data. Which Azure service should they use as the primary analytics platform?
18A retail company has an Azure SQL Database that handles OLTP transactions for its e-commerce platform. The analytics team needs to run complex reporting queries that join multiple tables (e.g., orders, products, customers) and aggregate millions of rows. These queries are long-running and would negatively impact the performance of the OLTP database if run directly. The company wants to use a separate analytics service that supports T-SQL queries, can scale compute independently, and provides a serverless option to avoid provisioning fixed resources. Which Azure service should they choose?
19A manufacturing company connects thousands of IoT sensors on an assembly line, each sending telemetry data every second. The data volume is terabyte-scale per day. The company needs to analyze the sensor data in near real-time to detect anomalies (e.g., temperature spikes) and also allow data scientists to run interactive ad-hoc queries on the historical data to find patterns. They prefer using a query language similar to SQL. Which Azure service should they choose?
20A financial services company stores petabytes of transaction data in Parquet format in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and aggregate billions of rows, with results expected within seconds. The company wants to use a massively parallel processing (MPP) engine that supports T-SQL and can be paused to reduce costs during off-hours. They also need native integration with Azure Data Factory and Power BI. Which Azure service should they use?
21A company stores terabytes of web server log data in CSV files in Azure Data Lake Storage Gen2. Data analysts need to run ad-hoc SQL queries on this data to analyze user behavior patterns. The queries are complex, involve joins across multiple files, and the analysts prefer not to move the data into a separate store. Which Azure service should they use?
22A company ingests streaming data from IoT devices into Azure Event Hubs. They need to perform real-time analytics on the data, such as aggregating temperature readings over 5-minute windows and triggering alerts when thresholds are exceeded. They also want to store the processed data in a data warehouse for historical analysis. Which Azure service should they use for the real-time processing?
23A manufacturing company ingests real-time sensor data from factory equipment via Azure Event Hubs. The data is a continuous stream of measurements (sensorId, timestamp, value). Additionally, historical maintenance records are stored as CSV files in Azure Data Lake Storage Gen2. The operations team needs to join the streaming data with the historical records in near real-time to detect anomalies. They also need to run complex T-SQL queries on the combined dataset for ad-hoc analysis. Which Azure service should they use as the primary analytics platform?
24A logistics company ingests real-time GPS data from delivery vehicles via Azure Event Hubs. The data includes vehicle ID, latitude, longitude, and timestamp. The company also has historical route plan data stored as CSV files in Azure Data Lake Storage Gen2. Data analysts need to combine the live stream with the historical data in near real-time to create a dashboard showing if vehicles are on schedule. They also need to run complex T-SQL queries on the combined dataset for ad-hoc reporting. Which Azure service should they use as the primary analytics platform?
25A company uses Azure Data Factory to run a pipeline that copies new orders from an on-premises SQL Server database to Azure Data Lake Storage every hour. After the data is in the data lake, an Azure Databricks notebook transforms it and loads it into Azure Synapse Analytics for reporting. Which type of data processing does the hourly copy operation represent?
26A manufacturing company ingests a continuous stream of sensor data from thousands of IoT devices into Azure Event Hubs. The company also stores historical equipment maintenance records in Azure SQL Database. The operations team needs to join the streaming sensor data with the historical maintenance records in near real-time to detect anomalies, and data scientists need to run ad-hoc T-SQL queries on the combined dataset for analysis. Which Azure service should they use as the primary analytics platform to meet both requirements?
27A company ingests raw clickstream data as JSON files into Azure Data Lake Storage Gen2. Data scientists need to explore the data interactively using Python notebooks, and the BI team needs to create reports from aggregated datasets derived from this data. The solution must be serverless, scale automatically, and minimize administration. Which Azure service should they choose?
28A retail company wants to analyze customer clickstream data in real-time to detect patterns and trigger personalized offers. They also store the raw clickstream data in Azure Data Lake Storage for later batch analysis. Which Azure service should they use for the real-time processing component?
29A smart building monitoring company ingests real-time sensor data (temperature, humidity, occupancy) from thousands of IoT devices into Azure Event Hubs. The company also stores historical building blueprints and maintenance records as CSV files in Azure Data Lake Storage Gen2. The engineering team needs to build a dashboard that displays live sensor readings overlaid on building floor plans, and also allows facility managers to run ad-hoc T-SQL queries that combine live sensor data with historical maintenance records. Which Azure service should they use as the primary analytics platform to meet both requirements?
30A financial services company uses Azure Synapse Analytics to process large volumes of transaction data. They have a dedicated SQL pool (formerly SQL DW) that ingests curated, aggregated data nightly from a data lake. Data analysts need to run ad-hoc, exploratory T-SQL queries on raw transaction data stored as Parquet files in Azure Data Lake Storage Gen2. These queries vary widely in complexity and frequency. The company wants to minimize costs for these ad-hoc queries while still using full T-SQL capabilities. Which approach should they recommend?
31A company receives daily sales data from multiple retail stores as CSV files that are uploaded to Azure Blob Storage. The data must be cleansed, validated, and aggregated before being loaded into Azure Synapse Analytics for reporting. The transformations involve complex business logic and must run reliably every night. The company wants a service that can orchestrate and execute the entire pipeline with minimal development effort. Which Azure service should they use?
32A marketing company ingests streaming data from social media feeds into Azure Event Hubs. They want to perform real-time sentiment analysis on the data and store the results in Azure SQL Database for immediate dashboarding. They also need to aggregate the raw data over longer time windows and store it in Azure Data Lake Storage for historical trend analysis. Which combination of Azure services should they use for the two processing paths?
33A company receives real-time clickstream data from its website via Azure Event Hubs. They need to detect fraudulent clicks within seconds and also produce daily aggregate reports of visitor statistics for historical analysis. Which combination of Azure services should they use for the real-time detection and the daily aggregation, respectively?
34A retail company receives a continuous stream of customer orders from their website via Azure Event Hubs. They also receive daily inventory updates from suppliers as CSV files uploaded to Azure Blob Storage. The company needs to calculate real-time order fulfillment availability by joining the streaming orders with the latest inventory snapshot. Additionally, they generate nightly sales reports from historical order data. Which Azure service should they use for the real-time processing component?
35A company is designing an enterprise analytics solution. They store raw data in its original format in a scalable repository, apply schema and transformations at read time, and also maintain a curated layer that enforces ACID transactions for data reliability. This architecture combines the flexibility of a data lake with the reliability of a data warehouse. Which term best describes this modern data architecture?
36A retail company stores years of historical sales data in Azure Data Lake Storage Gen2 as Parquet files. Business analysts need to run complex SQL queries over this data to identify sales trends, and they want to visualize the results in Power BI dashboards. They prefer to avoid moving data into a separate database to minimize storage costs and latency. Which Azure service should they use to query the data directly in the lake?
37A financial services company needs to build a data pipeline that ingests daily transaction files from multiple sources. The pipeline must perform data quality checks, transform data using complex business logic, and load it into Azure Synapse Analytics. The transformations involve conditional branching (e.g., if a transaction amount exceeds a threshold, apply additional validation). The company wants to minimize coding effort and prefers a visual, configuration-based approach. Which Azure service should they use as the primary orchestration and transformation engine?
38A retail company ingests clickstream data from its e-commerce website into Azure Event Hubs. They need to detect customer journey patterns in real time within seconds and also prepare aggregated data for daily trend reports stored in Azure Data Lake Storage Gen2. The real-time processing must handle high throughput and support complex temporal queries like sessionization. The daily aggregation should be cost-effective and use serverless compute. Which combination of Azure services should they use?
39A marketing company stores years of historical campaign data in Azure Data Lake Storage Gen2 as Parquet files. Data analysts need to run complex SQL queries over this data to identify trends, and they want to visualize results in Power BI dashboards. The company wants to avoid moving data into a separate database to minimize duplication and latency. Which Azure service should they use to query the data directly in the data lake?
40A marketing company collects real-time clickstream data from their website using Azure Event Hubs. They need to perform two tasks: (1) aggregate the number of clicks per advertising campaign every 5 minutes and display the results in a live dashboard, and (2) run complex historical queries on months of aggregated click data to identify trends. They want to minimize data movement and use serverless compute where possible. Which combination of Azure services should they use?
41A manufacturing company deploys IoT sensors on equipment in a factory. They need to monitor sensor data in real time to detect anomalies and trigger immediate alerts. They also need to store years of historical sensor data for monthly capacity planning reports that involve complex aggregations. The company wants a cost-effective solution that minimizes data movement between storage and compute. Which combination of Azure services should they use for real-time processing and historical batch analytics?
42A retail company needs to build an analytics pipeline on Azure. They ingest sales data from multiple store systems and an online e-commerce platform. The data must be cleaned, transformed, and loaded into a data warehouse for reporting. The company wants to use a modern ELT (Extract, Load, Transform) approach where raw data is stored first and then transformed. Order the following steps in the correct sequence for this pipeline. (Drag the steps into the correct order.)
43A company stores terabytes of historical sales data as Parquet files in Azure Data Lake Storage Gen2. Business analysts need to run ad-hoc SQL queries that involve complex joins and aggregations over this data. They want to avoid provisioning a dedicated cluster or moving data into a separate database. The queries must be executed using standard T-SQL syntax. Which Azure service should they use?
44A telecommunications company needs to analyze call detail records (CDRs) to detect fraud patterns and minimize revenue leakage. The data arrives as a continuous stream from network switches and must be queried within seconds of ingestion to flag suspicious activity. The analysts also need to run interactive ad-hoc queries over the last 90 days of CDR data using a Kusto query language. Which Azure service should they use as the primary data store and analytics engine?
45A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?
46A retail company ingests daily sales data from multiple stores as CSV files stored in Azure Blob Storage. The data must be cleaned and transformed using Spark, then loaded into Azure Synapse Analytics for large-scale reporting. The pipeline must run on a schedule, handle failures with retries, and minimize manual intervention. Which combination of Azure services should they use to orchestrate and execute this pipeline?
47A retail company needs to analyze clickstream data from their website in real time to detect fraudulent activity and also run complex historical queries on months of data to identify shopping trends. They want a single service that can handle both streaming and batch analytics using a unified query language, minimizing data movement. Which Azure service should they use?
48A data engineering team needs to transform raw clickstream data stored as Parquet files in Azure Data Lake Storage Gen2. They want to use standard T-SQL queries to perform transformations and aggregations. The team prefers a serverless option to avoid provisioning and managing dedicated compute resources. Which Azure service should they use?
49A manufacturing company uses IoT sensors to collect temperature and vibration data from machinery. They need to analyze the streaming data in real time to detect anomalies and trigger alerts. Additionally, they need to run complex historical queries on months of sensor data to identify equipment failure patterns. They want a single Azure service that can handle both real-time stream processing and large-scale batch analytics using a unified query language, minimizing the need for separate technologies. Which Azure service should they use?
50A company wants to build a modern data warehouse using a lakehouse architecture. They need to store raw data in its native format (e.g., CSV, JSON, Parquet) and also support BI reporting on curated, transformed data. They want to use a single storage layer for both raw and curated data. Which Azure service should they use as the core storage layer?
51A retail company processes petabytes of sales transaction data stored in Azure Data Lake Storage Gen2. They need to run recurring complex queries that involve large joins and aggregations. The queries must consistently complete within a fixed time window overnight. The company wants predictable performance and costs. Which Azure service should they use?
52A data engineering team is designing a modern data warehouse using Azure Synapse Analytics. They want to follow a lakehouse architecture where raw data is stored in its native format and then processed and curated for reporting. Which component in Azure Synapse Analytics is primarily used to store raw data in its original format without requiring a schema?
53A financial services company runs large-scale analytical queries on a dedicated SQL pool in Azure Synapse Analytics. They notice that during peak hours, complex aggregations consume excessive resources, causing slower queries from other users. They need to ensure that critical management reports always get enough resources and complete within a guaranteed time, while other less important queries do not starve them. Which feature should they implement?
54A financial services company uses a dedicated SQL pool in Azure Synapse Analytics to run large-scale analytical queries. During peak hours, complex aggregations consume excessive resources, causing slower performance for other users. The company needs to ensure that critical scheduled management reports always receive guaranteed resources and complete within a predictable timeframe, while less important ad-hoc queries do not interfere. Which feature should they implement to manage query resource allocation?
55A financial services company runs critical end-of-day reports in an Azure Synapse Analytics dedicated SQL pool. These reports require guaranteed resource allocation and must complete within a fixed time window. However, ad-hoc analytical queries from data scientists often consume resources, causing contention and delaying the critical reports. Which feature should the company implement to ensure the critical reports always receive sufficient resources?
56A large e-commerce company needs to build an analytics solution. They have streaming clickstream data from their website (JSON) and daily sales data from their transactional database (CSV). They need to perform real-time dashboards on clickstream for the current hour, and also run complex historical queries that join sales data with aggregated clickstream data over the past year. They want a single Azure service that can handle both stream processing and batch processing using a unified experience, without moving data between separate systems. Which Azure service should they use?
57A company stores weather sensor data in Azure Data Lake Storage Gen2. Data scientists need to run large-scale transformations and machine learning experiments on this data using Python and Apache Spark. They want to collaborate using shared Jupyter notebooks. Which Azure service should they use for this analytical workload?
58A retail company wants to analyze years of historical sales data stored as CSV files in Azure Blob Storage. The analytics solution must be serverless, allow T-SQL queries without managing infrastructure, and integrate directly with Power BI. Which Azure service should the company use?
59A business analyst needs to explore and create interactive visualizations of sales data stored in Azure Data Lake Storage Gen2 without writing SQL code. Which Azure service is best suited for this drag-and-drop data exploration?
60A financial institution needs to run complex queries against petabytes of historical trading data stored in Azure Data Lake Storage. The queries must be efficient and use columnar storage format. Which technology should they use to process this data?
61A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?
62A data analyst needs to run interactive SQL queries against petabytes of sales data stored in Parquet format in Azure Data Lake Storage Gen2. The analyst wants the fastest query performance for ad-hoc exploration without provisioning or managing any infrastructure. Which Azure service should they use?
63A data engineer needs to process raw clickstream data from multiple websites that is stored in Azure Blob Storage as JSON files. The processing must run automatically every hour, transform the data into a structured format for reporting, and handle schema changes in the source data without manual intervention. Which Azure service should be used?
64A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?
65A company uses Azure Synapse Analytics to run complex queries against large datasets stored in Parquet files in Azure Data Lake Storage Gen2. They notice that queries scanning entire partitions are slow due to high I/O overhead on the compute nodes. Investigation shows each daily partition contains thousands of small files (under 1 MB each). Which optimization should be implemented first to improve query performance?
66A data engineer needs to build a pipeline that runs every hour, copies new sales data from an on-premises SQL Server to Azure Data Lake Storage Gen2, transforms the data using PySpark, and then loads it into Azure Synapse Analytics dedicated SQL pool. Which Azure service should be used to orchestrate the entire pipeline?
67A data analyst needs to run complex SQL queries against petabytes of historical sales data stored in Azure Data Lake Storage Gen2. The solution must be serverless with pay-per-query pricing. Which Azure service should they use?
68A retail chain needs to blend two data sources for a near real-time dashboard: daily batch files from store systems (CSV files on Azure Blob Storage updated once per day) and live web clickstream data from Azure Event Hubs. The dashboard must refresh every 5 minutes with combined data. Which combination of Azure services should be used to ingest and process both data types most efficiently?
69A data engineering team needs to analyze petabytes of historical sales data stored in Azure Data Lake Storage Gen2. They require the ability to run complex SQL queries that join multiple tables and need high performance. The solution must separate compute from storage to allow independent scaling of resources. Which Azure service should they use?
70A financial institution runs complex analytical queries on trading data stored in Parquet files in Azure Data Lake Storage Gen2. The data is partitioned by date and contains billions of rows. Analysts frequently query within a specific date range, and the queries must return results in under 5 seconds. The current solution uses Azure Synapse Serverless SQL pool, but queries are slow because the serverless pool scans all partitions even when the WHERE clause filters on the date column. Which optimization should be implemented to improve query performance?
71A data analyst needs to run interactive SQL queries on a large dataset stored as CSV files in Azure Blob Storage. The analyst wants to explore the data using T-SQL without loading the data into a database. Which Azure service should they use?
72A retail company needs to run complex SQL queries on petabytes of historical sales data stored in Parquet files in Azure Data Lake Storage Gen2. They want a solution that provides fast query performance without managing infrastructure, and they prefer a pay-per-query pricing model. Which Azure service should they use?
73A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?
74A data engineer needs to build an analytics solution to transform large volumes of streaming data from IoT devices. The transformations involve complex Python and Spark code, and the results will be stored in Azure Data Lake Storage Gen2 for further analysis. Which Azure service is best suited for executing these transformations?
75A retail company collects streaming clickstream data from its website into Azure Event Hubs. They need to aggregate the data in real-time to count page views per product every minute and store the results in Azure SQL Database for a live dashboard. Which Azure service should they use to perform this real-time aggregation?
76A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table containing 5 TB of sales transactions. New data arrives continuously and is loaded daily. The company needs to load 500 GB of new data each day while allowing concurrent read queries on the most recent data without performance degradation. Which loading strategy optimizes both load speed and query performance?
77A business analyst needs to create interactive visualizations and share dashboards with colleagues using data stored in an Azure Synapse Analytics dedicated SQL pool. Which tool should the analyst use?
78A data analyst needs to run ad-hoc SQL queries on petabytes of log data stored as Parquet files in Azure Data Lake Storage Gen2. The queries join multiple tables and require high concurrency from multiple analysts. The solution should minimize cost by only paying for queries executed. Which Azure service should they use?
79A company has a data warehouse in Azure Synapse Analytics dedicated SQL pool. They need to load new sales data every night from a CSV file stored in Azure Data Lake Storage Gen2. The load process must be automated, scheduled, and have error handling for failed loads. Which Azure service should they use to orchestrate this process?
80A manufacturing company collects sensor data from thousands of IoT devices. The data arrives as a stream of time-stamped readings with a fixed schema (DeviceID, Timestamp, Temperature, Pressure, Vibration). They need to store this data and support both real-time dashboards showing the last hour of data and complex analytical queries over years of historical data. The solution must minimize storage costs and provide sub-second response for real-time queries. Which Azure service is best suited for this workload?
81A retail company needs to analyze streaming clickstream data from their website to detect shopping cart abandonment in real-time. They want to use Azure Stream Analytics to output results that can be visualized on a live dashboard. Which output sink allows the fastest data visualization for a real-time dashboard in Power BI?
82A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?
83A company is building a modern data warehouse on Azure using a lakehouse approach. Arrange the following steps in the correct order to implement a typical pipeline that starts with raw data ingestion and ends with business reporting.
84A company needs to run complex SQL queries on petabytes of data stored in Azure Data Lake Storage Gen2. They want to pay only for the queries they run and do not want to manage any infrastructure. Which Azure service should they use?
85A retail company receives daily sales data as CSV files in Azure Data Lake Storage Gen2. They need to load this data into an Azure Synapse Analytics dedicated SQL pool every night. The process must be automated, scheduled, and include error handling for failed loads. Which Azure service should they use to orchestrate this pipeline?
86A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every night, they need to load 500 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2. The loading process must be automated, scheduled, and include error handling (e.g., skip corrupt rows and log them). Which Azure service should be used to orchestrate this load pipeline?
87A data warehouse team uses Azure Synapse Analytics dedicated SQL pool to serve both business executives running weekly reports and data scientists running complex ad-hoc queries on large fact tables. The ad-hoc queries often consume excessive resources and degrade performance for the weekly reports. The team needs to ensure that the weekly reports always get guaranteed resources regardless of other concurrent queries. Which Synapse feature should they use?
88A manufacturing company needs to build an analytics solution for IoT sensor data. Thousands of devices send real-time temperature and vibration readings. The solution must: (1) ingest the streaming data reliably, (2) perform real-time aggregations (e.g., average temperature per device every minute), and (3) store the aggregated results in Azure Synapse Analytics for historical reporting and dashboards. Which combination of Azure services should be used?
89A manufacturing company collects real-time temperature data from thousands of IoT sensors. They need to build an analytics solution that processes the streaming data, computes the average temperature per device every minute, and outputs the results to a Power BI dashboard for near real-time visualization. Which Azure service should they use for the real-time stream processing?
90A company needs to build a centralized analytics platform that can query both structured data in a relational data warehouse and unstructured data in a data lake using a single SQL-based interface. They want to minimize data movement and use a serverless, on-demand compute model for ad-hoc queries. Which Azure service should they use?
91A transportation company collects real-time GPS data from thousands of delivery vehicles. They need to process this streaming data to detect delays and generate alerts when a vehicle is behind schedule. Which Azure service should they use for the stream processing?
92A data analyst needs to run ad-hoc SQL queries on large volumes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are unpredictable, and the analyst wants to pay only for the compute resources consumed by each query. Which Azure Synapse Analytics compute model should be used?
93A manufacturing company installs temperature sensors in a factory. Sensor data is streamed to Azure Event Hubs. The company needs to detect when the average temperature of any sensor exceeds 100°F over a 5-minute sliding window and then send an alert. Which Azure service should be used for this real-time stream processing?
94A data engineering team needs to build a pipeline that ingests streaming data from IoT devices into Azure Data Lake Storage Gen2. The data arrives as JSON messages. They want to use a service that can capture the streaming data in near real-time and store it as files in the data lake without writing custom code for the ingestion. Which Azure service should they use?
95A data engineering team needs to build a batch processing pipeline that transforms large volumes of sales data stored in Azure Data Lake Storage Gen2. The transformations include aggregations and joins, and the output should be stored back in the data lake as Parquet files. The team wants a serverless compute option that automatically scales and charges per second. Which Azure service should they use?
96A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every day, they need to incrementally load 100 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The load should use PolyBase for efficient parallel data transfer and must be orchestrated on a recurring schedule. Which Azure service should they use to create and manage this pipeline?
97A data engineering team wants to build a batch analytics pipeline. The raw data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The final output will be a set of tables in Azure Synapse Analytics (dedicated SQL pool) that will be used to create reports in Power BI. Arrange the following steps in the correct order for a typical ETL process.
98A data engineering team needs to build a real-time dashboard showing sales totals by region. Sales transactions are streamed from point-of-sale systems into Azure Event Hubs. The team wants to aggregate the data in near real-time (e.g., every minute) and store the results in Azure SQL Database for visualization in Power BI. Which Azure service should they use for the aggregation step?
99A manufacturing company collects temperature and vibration data from thousands of sensors. The data is streamed to Azure Event Hubs. The company wants to store all this raw data in Azure Data Lake Storage Gen2 for future batch analytics. They need a solution that automatically writes the streaming data to the data lake in near real-time, without requiring any custom code for the write operation. Which Azure feature should they use?
100A data engineering team needs to build a batch ETL pipeline that transforms large volumes of clickstream data stored as CSV files in Azure Data Lake Storage Gen2. The transformations require running distributed Python and Scala code using Apache Spark. The transformed data will be loaded into a data warehouse for reporting. The team wants a serverless compute environment that automatically scales and charges per second. Which Azure service should they use to run the Spark transformations?
101A company uses Azure Synapse Analytics dedicated SQL pool as its data warehouse. New data is loaded into the warehouse every few minutes. The company wants to visualize the data with near real-time updates in a dashboard that can be refreshed automatically. Which tool and connection mode should they use?
102A retail company stores historical sales data from multiple stores in Azure Data Lake Storage Gen2 as CSV files. They need to run complex SQL queries that join and aggregate data across multiple files to generate weekly sales reports. They want a serverless query service that can directly query the data in the lake without loading it into a separate database. Which Azure service should they use?
103A data analyst needs to run ad-hoc SQL queries on large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the data volume varies. The analyst wants to pay only for the amount of data processed per query and does not want to manage any infrastructure. They also need to create views in T-SQL to simplify queries for Power BI reports. Which Azure service should they use?
104A financial services company processes real-time stock trade data from multiple exchanges. Trades are ingested into Azure Event Hubs. The company needs to compute a 5-minute sliding window average of trade prices per stock symbol and ensure that each trade is processed exactly once within the window. The aggregated results must be stored in Azure SQL Database for historical reporting and also sent to a Power BI dashboard for near real-time visualization. Which Azure service should be used for the real-time processing?
105A company needs to ingest data from an on-premises SQL Server database into Azure SQL Database every hour. During the ingestion, they need to filter out rows where Status = 'Inactive' and convert a date column to a different format. They want a cloud-based, code-free solution that can schedule and orchestrate this task. Which Azure service should they use?
106A retail company needs to analyze sales transactions as they occur to detect fraud patterns and immediately block suspicious orders. They also need to run daily batch reports on historical sales data. Which combination of Azure services should they use to meet both real-time and batch processing requirements?
107A data engineering team is designing a modern data warehouse on Azure. They have raw data landing in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files. They need to perform transformations using Apache Spark, and then load the transformed data into Azure Synapse Analytics for high-performance analytical queries. The team wants to use a single orchestration service to schedule, monitor, and manage the entire pipeline. Which Azure service should they choose for orchestration?
108A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. They frequently run queries that aggregate sales by product and region over the past month. The queries are slow because they scan the entire table. Which index type should they implement on the fact table to improve query performance for these aggregations?
109A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The fact table contains billions of rows and is hash-distributed on ProductID. Queries aggregate sales by store and product for the current month and join with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries are slow due to data movement during joins. Which design change will most reduce data movement and improve query performance?
110A data engineering team is building a batch analytics pipeline. Raw clickstream data is stored as Parquet files in Azure Data Lake Storage Gen2. The team needs to transform the data using Apache Spark (Python code) and then load the results into Azure Synapse Analytics for high-performance reporting. They want to use a serverless compute option for Spark to avoid managing clusters. Which combination of Azure services should they use for the transformation and loading?
111A data analyst needs to query large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are ad-hoc and infrequent. The analyst wants to run SQL queries directly on the data without creating any storage or compute infrastructure, and only pay for the amount of data processed. They also need to create T-SQL views to simplify queries for Power BI reports. Which Azure service should they use?
112A logistics company needs to analyze GPS data from delivery trucks in real time to detect delays and reroute deliveries. The GPS data is streamed into Azure Event Hubs. They also need to combine this live data with static route information stored in Azure SQL Database. Which Azure service should they use for the real-time processing?
113A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?
114A manufacturing company ingests real-time sensor data from assembly line machines into Azure Event Hubs. The company needs to calculate a 5-minute rolling average of temperature readings for each machine and compare it against a static threshold value stored in a CSV file in Azure Blob Storage. If the average exceeds the threshold, an alert must be triggered. Which Azure service should be used for this real-time data processing?
115A data analyst needs to run ad-hoc SQL queries on terabytes of CSV files stored in Azure Data Lake Storage Gen2. The queries are infrequent and unpredictable. The analyst wants to pay only for the amount of data processed by each query, and does not want to manage any compute or storage infrastructure. Which Azure service should they use?
116A retail company collects sales data from multiple stores. Data is ingested into Azure Data Lake Storage Gen2 as CSV files. The data team needs to run ad-hoc SQL queries on this data without moving it, and they want to pay only for the amount of data processed. They also need to integrate with Power BI for visualization. Which Azure service should they use?
117A company uses Azure Synapse Analytics dedicated SQL pool for a large data warehouse. The fact table contains billions of rows and is hash-distributed on ProductID. Frequent queries join this fact table with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries aggregate sales by store and product for recent months, but run slowly due to data movement during joins. Which design change will most reduce data movement and improve query performance?
118A data engineering team needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Apache Spark with Python code. They want a fully managed service that provides serverless Spark pools, meaning no clusters to manage and automatic scaling. Which Azure service should they use?
119A data analyst needs to run ad-hoc SQL queries on petabytes of Parquet files stored in Azure Data Lake Storage Gen2. The queries are infrequent and highly selective. The analyst wants to pay only for the data scanned by each query and does not want to provision any compute resources. They also need to create views to simplify future queries for other analysts. Which Azure service should they use?
120A financial services company stores transaction data in Azure Data Lake Storage Gen2 as Parquet files, partitioned by date. The data volume is 5 TB per day. The analytics team runs ad-hoc SQL queries to detect fraudulent patterns. Queries are highly selective (filtering on AccountID and date range). The team also needs to create external tables and views for use in Power BI. They want to pay only for the data processed by each query and avoid provisioning any compute resources. Which Azure service should they use?
121A logistics company uses IoT sensors on delivery trucks to transmit GPS location, speed, and engine diagnostics every 10 seconds. The data is ingested into Azure Event Hubs. The company needs to analyze the data in real time to identify speeding trucks and send alerts. The analysis requires joining the live sensor data with a reference table of truck details (e.g., driver name, route number) stored in Azure SQL Database. Which Azure service should they use for the real-time processing?
122A data analyst needs to run ad-hoc SQL queries on petabytes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent but must return results within seconds. The analyst wants to pay only for the amount of data processed and does not want to manage any compute infrastructure. Additionally, they need to create views to simplify future reporting in Power BI. Which Azure service should they use?
123A company needs to run complex analytical queries that aggregate terabytes of sales data across multiple years. The queries are used for monthly business reports and are not latency-sensitive. The data is stored in Azure Data Lake Storage Gen2. The company wants a fully managed, petabyte-scale data warehouse solution that supports SQL queries and integrates with Power BI for reporting. Which Azure service should they use?
124A financial services company has raw transaction data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files, partitioned by date. The analytics team needs to run complex SQL queries that join multiple datasets, including reference data from an Azure SQL Database, to generate risk reports. They require enterprise-grade security features such as row-level security (RLS) and column-level security. They also want to use the same service for data transformation and loading (ETL) into a curated layer. Which Azure service should they choose?
125Drag and drop the steps to perform a point-in-time restore of an Azure SQL Database in the correct order.
126Drag and drop the steps to create an Azure Stream Analytics job in the correct order.
127Match each Azure data tool to its purpose.
128Match each data type to its category in Azure.
129A company runs a real-time dashboard in Power BI that displays sales data from Azure Synapse Analytics. The dashboard must show data with less than 5 seconds of latency. Which Azure service should be used to ingest streaming sales events into Azure Synapse Analytics?
130A data engineer needs to design a solution for a healthcare organization that must store patient records for 7 years to comply with regulatory requirements. The data will be accessed infrequently after the first year. Which Azure storage tier should be used for data older than one year?
131A company uses Azure Synapse Analytics dedicated SQL pool to run large-scale analytics. The data engineering team notices that queries are slow due to excessive data movement between distributions. Which index type should be recommended to minimize data movement for fact tables that are frequently joined on a specific column?
132A data analyst needs to create a report in Power BI that combines sales data from Azure SQL Database and inventory data from Azure Cosmos DB. The report should refresh daily. Which Power BI feature should be used to combine these data sources?
133A company uses Azure Data Lake Storage Gen2 to store IoT sensor data. The data is partitioned by date and sensor ID. A data scientist needs to efficiently query only the last 7 days of data for a specific sensor. Which strategy minimizes the amount of data scanned?
134A multinational corporation uses Azure Synapse Analytics serverless SQL pool to query data in Azure Data Lake Storage. The security team requires that access to specific columns containing personally identifiable information (PII) be restricted based on the user's role. Which feature should be implemented?
135A company wants to build a near-real-time analytics solution on Azure. IoT devices send telemetry data to Azure Event Hubs. The data must be processed and stored in Azure Cosmos DB for low-latency queries. Which Azure service should be used to process the streaming data?
136A data analyst uses Power BI to create a report that combines data from Azure Synapse Analytics and an on-premises SQL Server database. The on-premises data must be refreshed every hour. Which component is required to connect to the on-premises data source?
137A company uses Azure Databricks for data engineering. The team wants to implement a medallion architecture (bronze, silver, gold) to organize data quality layers. In which layer should data be stored in a format optimized for analytics and reporting?
138Which TWO Azure services can be used to orchestrate and automate data pipelines? (Choose two.)
139Which THREE components are required to implement a real-time analytics solution using Azure Stream Analytics? (Choose three.)
140Which TWO Azure services can be used to store semi-structured data? (Choose two.)
141A company uses Azure Synapse Analytics for its data warehouse. They notice that queries against a large fact table are slow. The table is partitioned by month and uses clustered columnstore index. Which action would most likely improve query performance?
142Your company uses Azure Data Lake Storage Gen2 and wants to grant a data scientist read-only access to a specific container. Which built-in RBAC role should you assign?
143An organization uses Azure Stream Analytics to process real-time IoT data from millions of devices. They need to ensure that the output is exactly once delivery semantics to a Power BI dataset. Which output configuration should they use?
144A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics. The data volume is approximately 2 TB and the network bandwidth is limited. Which approach minimizes data transfer time?
145Your company wants to use Microsoft Fabric to create a unified analytics platform. Which component in Microsoft Fabric provides a lake-centric, collaborative, and governed data foundation?
146A data analyst is using Azure Databricks to transform streaming data from Event Hubs. They need to ensure that if a failure occurs, the streaming job can resume processing from the last committed offset. Which checkpointing mechanism should they configure?
147Your organization uses Azure Purview to scan data sources. You need to set up a scan rule set that automatically classifies credit card numbers in Azure SQL Database. Which built-in classification rule should you enable?
148A company runs a critical workload in Azure Synapse Analytics. They need to ensure that if a single node fails, the data in the control node and compute nodes is not lost. Which configuration should they use?
149A data analyst needs to create a real-time dashboard in Power BI that refreshes every second from an Azure Stream Analytics job. Which Power BI feature should they use?
150Which TWO Azure services can be used to perform interactive data analytics on large datasets without managing infrastructure? (Choose two.)
151Which THREE components are part of Microsoft Fabric's end-to-end analytics platform? (Choose three.)
152Which TWO of the following are benefits of using a data lake architecture? (Choose two.)
153You are deploying the above ARM template snippet for a storage account. What is the effect of setting 'isHnsEnabled' to true?
154You run the above Kusto query in Azure Data Explorer. What does the query return?
155You are reviewing the Azure Data Factory mapping data flow configuration above. Which transformation is missing to ensure that only sales from the current year are loaded?
156A company uses Azure Synapse Analytics to run large-scale batch processing jobs every night. The jobs currently take 6 hours to complete, but the business requires completion within 4 hours. Which action should the company take to improve job performance?
157A healthcare organization must build an analytics solution that processes streaming patient vitals data and provides real-time dashboards. The solution must also store historical data for compliance audits. Which combination of Azure services should the organization use?
158A multinational corporation uses Azure Data Factory to orchestrate data pipelines across multiple regions. The company notices that pipeline runs in the West Europe region consistently fail due to throttling errors from the source database. The source database is an Azure SQL Database in the same region. The company needs to reduce throttling while maintaining pipeline throughput. What should the company do?
159A company uses Azure Data Lake Storage Gen2 to store raw data files. Data engineers need to transform this data using a serverless approach without managing infrastructure. Which Azure service should they use?
160A retail company uses Power BI to create sales reports. The data source is an Azure SQL Database that updates every 15 minutes. The reports must reflect near real-time data without manual refresh. Which Power BI feature should the company use?
161A company uses Azure Stream Analytics to process IoT data from thousands of devices. The output is written to Azure SQL Database for reporting. Recently, the job latency increased significantly. The company suspects that the SQL Database is throttling writes. Which action should the company take to reduce latency?
162A data scientist needs to perform exploratory data analysis on a large dataset stored in Azure Data Lake Storage Gen2 using Python notebooks. The solution must minimize infrastructure management. Which Azure service should the data scientist use?
163A company uses Azure Data Factory to copy data from an on-premises SQL Server to Azure Data Lake Storage Gen2. The transfer must be accelerated using WAN optimization. Which Data Factory feature should the company enable?
164A company uses Azure Synapse Analytics to run both interactive queries and large batch loads. The interactive queries must have consistent performance regardless of batch load activity. Which Synapse feature should the company use?
165Which TWO Azure services can be used to build a real-time analytics solution that ingests streaming data and provides dashboards with low latency? (Choose two.)
166Which THREE components are typically part of a modern data warehouse architecture on Azure? (Choose three.)
167Which TWO Azure services can be used to perform data transformation in a serverless manner? (Choose two.)
168A company uses Azure Synapse Analytics to run large-scale analytics on sales data. They need to ensure that the workload can automatically scale based on demand without manual intervention. What feature should they configure?
169A data analyst needs to create a real-time dashboard in Power BI that displays streaming data from Azure Event Hubs. The data must be refreshed every second. Which Power BI feature should they use?
170Which TWO options are valid ways to ingest data into Azure Data Lake Storage Gen2?
171Your company has a data pipeline in Azure Data Factory that runs daily. Recently, the pipeline started failing with timeouts. You suspect a downstream database is slow. What should you do to monitor and alert on pipeline run duration?
172Refer to the exhibit. An administrator runs an Azure CLI command to show the status of a Synapse SQL pool. The output shown is returned. What does this output indicate about the SQL pool?
173A company wants to analyze customer feedback from surveys and social media. The data includes both structured (ratings) and unstructured (comments) text. They plan to use Azure Cognitive Services for sentiment analysis. Which service should they use for text analytics?
174Which THREE components are part of an end-to-end analytics solution on Azure?
175An organization needs to run complex queries on petabytes of data stored in Azure Data Lake Storage. They want to use serverless compute to avoid managing infrastructure. Which Azure service should they use?
176Refer to the exhibit. A data engineer runs the PowerShell script shown. What is the purpose of this script?
177A company uses Azure Stream Analytics to process IoT data from thousands of devices. They need to store the results in a way that supports fast querying for historical analysis. Which output sink should they use?
178Which TWO tools can be used to transform data in an Azure data pipeline?
179A data analyst needs to create interactive reports from data stored in an Azure SQL Database. They want to use a self-service tool that requires minimal IT support. Which tool should they use?
180Refer to the exhibit. An Azure Data Factory pipeline JSON is shown. What does this pipeline do?
181Which THREE are benefits of using a data warehouse in Azure?
182A company uses Azure Databricks for data engineering. They need to ensure that only authorized users can access the workspace, and they want to use single sign-on (SSO) with their existing identity provider. Which integration should they configure?
183A company plans to implement a near-real-time analytics solution for streaming IoT sensor data. Which Azure service should they use to ingest and process the data streams?
184A data engineer needs to build a data pipeline that runs daily to copy sales data from an on-premises SQL Server to Azure Synapse Analytics. Which Azure service should they use to orchestrate the pipeline?
185A company uses Azure Synapse Analytics for its data warehouse. They notice that query performance is degrading over time as data grows. Which action would most likely improve performance without requiring additional compute resources?
186Which TWO Azure services can be used to perform interactive ad-hoc analytics on large datasets using Apache Spark?
187Which THREE components are essential for building a modern data warehouse architecture on Azure?
188Which TWO Azure services are designed for big data batch processing?
189Refer to the exhibit. A data engineer is reviewing an ARM template for a storage account. What does the property 'isHnsEnabled' set to true indicate?
190Refer to the exhibit. An analyst runs this Kusto Query Language (KQL) query in Azure Data Explorer. What is the primary purpose of this query?
191Refer to the exhibit. A developer is creating an ARM template for an Azure Synapse workspace. What is the purpose of the 'defaultDataLakeStorage' property?
192A company wants to build a real-time dashboard that visualizes sales data as transactions occur. Which combination of Azure services should they use?
193A data scientist needs to train a machine learning model using data stored in Azure Data Lake Storage. They want to use a collaborative notebook environment with built-in experiment tracking. Which Azure service should they use?
194An organization has a large dataset stored in Azure Blob Storage. They need to run complex analytics using SQL queries and also want to use the same data for machine learning models. Which Azure service provides both SQL-based analytics and native integration with ML frameworks?
195A company wants to provide self-service analytics to business users, allowing them to create reports and dashboards from data stored in Azure Data Lake Storage. Which tool should they use?
196An organization has a data lake that contains both structured and unstructured data. They need to catalog the data assets and enable data discovery for users. Which Azure service should they use?
197A financial services company needs to run ad-hoc SQL queries on petabytes of data stored in Azure Data Lake Storage without provisioning a dedicated data warehouse. Which Azure service should they use?
198A company runs a SQL Server database on an Azure virtual machine. They need to offload reporting queries to a read-only copy without modifying the application. Which Azure service should they use?
199A data engineer needs to transform and clean data from multiple sources before loading it into Azure Synapse Analytics. Which Azure service should they use for this ETL process?
200A company uses Azure Data Lake Storage Gen2 as a data lake. They need to enforce row-level security for sensitive data so that sales representatives can only see rows for their assigned region. Which approach should they use?
201An organization wants to build a real-time dashboard that visualizes IoT sensor data as it arrives. Which Azure service should they use for processing the streaming data?
202A company is designing a data analytics solution. They need to store large volumes of raw data in its native format and support schema-on-read for data science exploration. Which storage technology should they use?
203A data analyst needs to create an interactive report that combines sales data from Azure SQL Database and Azure Cosmos DB. The report must refresh daily. Which tool should they use?
204A company uses Azure Synapse Analytics for their data warehouse. They notice that queries against the fact table are slow. The fact table is hash-distributed on OrderID. Most queries filter by CustomerID. What should they do to improve performance?
205A business user wants to ask natural language questions about their data in Power BI and get answers without writing DAX. Which Power BI feature should they use?
206A company ingests streaming data from thousands of devices into Azure Event Hubs. They need to transform and aggregate the data in real time before storing it in Azure Data Lake Storage Gen2. Which Azure service should they use between Event Hubs and ADLS Gen2?
207Which TWO Azure services can be used to build a data pipeline that moves data from on-premises SQL Server to Azure Synapse Analytics?
208Which THREE components are part of a typical modern data warehouse architecture on Azure?
209Which TWO are valid use cases for Azure Stream Analytics?
210Refer to the exhibit. You deploy this Azure Stream Analytics job. The job runs but no data is written to the Azure SQL Database table. What is the most likely cause?
211Refer to the exhibit. You have created this Azure Data Factory pipeline. When you run it, the copy activity fails with a connectivity error. What is the most likely missing component?
212Refer to the exhibit. A database administrator runs this KQL query in Azure Monitor Log Analytics. The query returns no results. What is the most likely reason?
213Your company runs a sales analytics dashboard on Power BI that refreshes every hour from Azure Synapse Analytics. During peak hours, the dashboard refresh fails with a 'timeout' error. Which action should you take FIRST to resolve the issue?
214A retail company uses Azure Data Lake Storage Gen2 as a data lake and Azure Databricks for ETL. They notice that a Spark job reading Parquet files from the data lake fails with an 'Access Denied' error when the job runs as a service principal. The service principal has Storage Blob Data Contributor role on the storage account. What is the most likely cause?
215Your company needs to build an analytics solution that can handle both batch and streaming data from IoT devices. The solution must allow complex event processing and real-time dashboards. Which Azure service should you use as the primary data ingestion and processing layer?
216Your company has a Power BI dashboard that uses a data model with a single large fact table and several dimension tables. The dashboard loads slowly when users filter by multiple dimensions. Which design change would MOST improve performance?
217Your data engineering team is designing a data pipeline that ingests data from multiple sources into Azure Data Lake Storage Gen2. The data must be cataloged in Azure Purview for discoverability. Which approach ensures that the data lineage is automatically captured?
218Your company uses Azure Synapse Analytics to run analytical queries on large datasets. You need to ensure that queries against a frequently accessed fact table perform well without impacting other workloads. Which feature should you use?
219Your company has a Power BI report that uses DirectQuery to Azure SQL Database. Users report that the report is slow when multiple users access it simultaneously. The database is underprovisioned. Which action should you take to improve performance without changing the report design?
220Your company is migrating an on-premises SQL Server data warehouse to Azure. The solution must support both historical analytics and real-time reporting. Which Azure service should you recommend as the primary data store?
221Your company uses Azure Databricks to process streaming data from Event Hubs. The data is transformed and written to Azure Data Lake Storage Gen2 as Delta tables. You notice that some records are duplicated in the Delta tables. Which configuration change should you make to prevent duplicates?
222Which TWO Azure services can be used to perform data transformation in an analytics pipeline? (Choose two.)
223Which THREE components are part of a typical modern data warehouse architecture on Azure? (Choose three.)
224Which TWO Azure services can be used to store semi-structured data like JSON or Parquet files for analytics? (Choose two.)
225Refer to the exhibit. You execute the above T-SQL statements in Azure Synapse Analytics. What is the purpose of this code?
226Refer to the exhibit. You have an Azure Data Factory pipeline definition as shown. The pipeline fails with a 'Source not found' error. The BlobInputDataset points to a container that exists. What is the most likely cause?
227Refer to the exhibit. You have a Power BI measure defined as shown. What does this measure return?
228A company uses Azure Synapse Analytics to run large-scale data transformations. They need to optimize costs for predictable workloads that run every night. Which Azure feature should they configure?
229A data analyst needs to create a real-time dashboard in Power BI that displays sales data from an Azure SQL Database. The dashboard must update every 10 minutes without manual refresh. Which Power BI feature should they use?
230Refer to the exhibit. A team is deploying an Azure Storage container using an ARM template. The template sets publicAccess to 'None'. However, after deployment, users report they cannot access data even with a valid SAS token. What is the most likely cause?
231A company wants to build a data lake on Azure for storing structured, semi-structured, and unstructured data. The solution must support fast queries on structured data without moving data to a separate store. Which Azure service should they use?
232An organization runs a mission-critical analytics workload on Azure Synapse Analytics. They need to ensure high availability and automatic failover in case of a regional outage. Which configuration should they implement?
233A data engineering team uses Azure Data Factory to orchestrate an ETL pipeline that loads data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline fails intermittently with timeout errors during the copy activity. The network is stable. What should they do first to resolve the issue?
234A company needs to analyze streaming data from IoT devices in real time. They want to identify anomalies and trigger alerts. Which Azure service should they use as the core processing engine?
235A data analyst needs to create a Power BI report that combines sales data from Azure SQL Database and marketing data from a CSV file stored in Azure Blob Storage. The report should refresh automatically. What is the recommended approach?
236Refer to the exhibit. An administrator is configuring aggregations in Power BI Premium to improve performance on a large dataset. The aggregation is defined on the Sales table with SUM(Amount) grouped by ProductCategory, Region, and Date at the monthly level. However, some reports that query daily data are still slow. What is the most likely reason?
237Which TWO Azure services can be used to perform large-scale data transformation and processing in a serverless manner?
238Which THREE components are essential for building a real-time analytics solution on Azure?
239Which TWO are benefits of using Azure Synapse Analytics for a data warehouse workload?
240You are the data engineer for a large retail company. The company has an existing on-premises SQL Server database with 10 years of transactional data. They want to move this data to Azure to enable advanced analytics using Azure Synapse Analytics. The data includes customer orders, product details, and inventory. The solution must minimize data movement and support both batch and real-time analytics. The company also wants to use Power BI for reporting. They have a limited budget and prefer a serverless option for compute. You are evaluating the following approaches: A) Use Azure Data Factory to copy all data to Azure Data Lake Storage Gen2, then use Azure Synapse Serverless SQL pool to query the data, and finally connect Power BI to the serverless SQL endpoint. B) Use Azure Database Migration Service to migrate the SQL Server database to Azure SQL Database, then use Azure Synapse Analytics with a dedicated SQL pool to perform analytics, and connect Power BI to the dedicated pool. C) Use Azure Data Factory to copy all data to Azure Blob Storage, then use Azure Stream Analytics to perform real-time analytics, and connect Power BI directly to Stream Analytics output. D) Use Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool for batch analytics, and use Azure Event Hubs and Stream Analytics for real-time data, with Power BI connecting to both serverless SQL and Stream Analytics. Which approach best meets the requirements?
241Your company is developing a new analytics solution to track customer sentiment from social media feeds. The data arrives as a continuous stream of JSON messages. The solution must process the data in near real-time, enrich it with customer profile data stored in Azure Cosmos DB, and then store the results in a data lake for historical analysis. The team wants to use a low-code approach for the data processing logic. You are considering the following architectures: A) Use Azure Event Hubs to ingest the stream, Azure Stream Analytics to process and enrich the data using Cosmos DB as a reference data source, and output to Azure Data Lake Storage Gen2. B) Use Azure IoT Hub to ingest the stream, Azure Databricks to process the data, and write to Azure Blob Storage. C) Use Azure Event Hubs to ingest the stream, Azure Functions to process each message, query Cosmos DB for enrichment, and write to Azure Data Lake Storage Gen2. D) Use Azure Event Hubs to ingest the stream, Azure Data Factory to execute a mapping data flow for enrichment, and write to Azure Data Lake Storage Gen2. Which architecture best meets the requirements of near real-time processing, enrichment, and low-code?
242A small business wants to start using Azure for analytics. They have a few CSV files stored on-premises that they want to analyze. They have no budget for complex infrastructure and prefer a fully managed, serverless solution. They need to create interactive visualizations and share them with their team. The data does not change frequently, so they are okay with daily refreshes. Which of the following options should they choose? A) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Databricks to create a data processing pipeline, and then use Power BI to visualize the results. B) Upload the CSV files to Azure Blob Storage, use Azure Data Factory to load the data into Azure SQL Database, and then use Power BI to connect and visualize. C) Upload the CSV files to OneDrive for Business, use Power BI Desktop to import the data, and publish to Power BI Service with scheduled refresh. D) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool to query the data, and then use Power BI to connect. Which option is the simplest and most cost-effective?
243A company runs a sales analytics workload on Azure Synapse Analytics. They notice that queries against the fact table are slow. The fact table is hash-distributed on the SalesDate column. Which design change would most likely improve query performance?
244A retail company uses Azure Data Lake Storage Gen2 to store raw clickstream data. They need to process this data using Azure Databricks to create hourly aggregated reports. The data pipeline must minimize costs while meeting a five-minute processing SLA. What is the most cost-effective compute option?
245A company wants to build a real-time analytics dashboard for IoT sensor data. Which combination of Azure services should they use?
246A company uses Azure Synapse Analytics to run a data warehouse. They need to load 500 GB of historical data from Azure Blob Storage into a staging table. They want the fastest load performance with minimal administrative overhead. Which method should they use?
247A company has a Power BI dashboard that refreshes daily from an Azure SQL Database. During refresh, the database experiences high CPU usage that impacts transactional applications. They need to minimize impact while keeping the dashboard up-to-date. What should they do?
248Which TWO Azure services are primarily used for batch processing of large data volumes?
249Which THREE components are required to implement a modern data warehouse architecture (medallion architecture) on Azure?
250Which TWO options are valid ways to secure data at rest in Azure Synapse Analytics?
251Refer to the exhibit. You are reviewing an ARM template that deploys a SQL database in Azure Synapse. The template sets the storageAccountType to GRS. What is a valid concern regarding cost and performance?
252Refer to the exhibit. A data engineer wants to ensure that all Azure Storage accounts used for analytics use customer-managed keys. They apply this Azure Policy. What is the outcome?
253You are a data engineer for a large e-commerce company. The company uses Azure Data Lake Storage Gen2 to store customer transaction data. They also use Azure Databricks for data transformation and Azure Synapse Serverless SQL pool for ad-hoc queries. Recently, the data lake has grown to 10 TB, and query performance in Synapse Serverless has degraded significantly. Users complain that queries that used to take seconds now take minutes. You need to improve query performance without moving data to a dedicated SQL pool. The data is stored in Parquet format, partitioned by date. You notice that the queries often filter on CustomerID and Date. Current queries scan all partitions even when only a few days are needed. What is the most effective solution to improve performance?
254You are a data architect for a healthcare organization. The organization needs to build a real-time analytics solution to monitor patient vital signs from IoT devices. The data arrives at a rate of 10,000 events per second. Each event contains patient ID, timestamp, heart rate, blood pressure, and oxygen saturation. The solution must alert clinicians within 10 seconds when a patient's vital signs exceed predefined thresholds. Additionally, the solution must store the raw data for historical analysis and compliance. You plan to use Azure Event Hubs for ingestion. Which combination of services should you use to meet the requirements? Consider: processing low latency alerts, storing raw data in cost-effective storage, and enabling historical analytics. You also need to ensure that the solution can scale to handle future growth.
255You are a business analyst at a manufacturing company. The company uses Azure SQL Database to store production data. You need to create a Power BI report that shows real-time machine efficiency. The report must refresh every 5 minutes to show current metrics. You have been granted read-only access to the database. The database is under heavy load from transactional applications, and you want to minimize additional impact. Which approach should you take to create the report?
256You are a data engineer for a financial services company. The company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. They have a fact table named Transactions that contains 2 billion rows. The table is hash-distributed on the AccountID column. Users run reports that aggregate transaction amounts by date and account type. The reports are slow. Upon investigation, you find that the distribution is highly skewed because a few accounts have millions of transactions. You need to improve query performance without redesigning the entire schema. Which action should you take?
257You are a data architect for a logistics company. The company uses Azure Data Lake Storage Gen2 to store shipment tracking data. The data is ingested from IoT devices on trucks. Each record contains truck ID, timestamp, GPS coordinates, speed, and fuel level. The volume is 5 TB per day. The company wants to build a near-real-time dashboard to monitor truck locations and speeds. They also need to run daily batch analytics to compute fuel efficiency trends. You need to design a solution that minimizes latency for the dashboard and maximizes cost efficiency for batch processing. You plan to use Azure Event Hubs for ingestion. Which approach should you take?
258Your company is designing a big data analytics solution on Azure. The solution must ingest streaming data from IoT devices, store the data in its raw format, and then use a distributed processing engine to transform the data before loading it into a serving layer for reporting. Which TWO Azure services should you include in the design?
259A data engineering team is building a data pipeline to run daily batch loads from an on-premises SQL Server to Azure Synapse Analytics. The pipeline must include data transformation using a visual interface with no coding, and must support schema mapping and data validation. Which THREE Azure services should be used together?
260Your company operates a retail analytics platform. Data from point-of-sale systems is ingested in real time into Azure Event Hubs. The data is then consumed by an Azure Stream Analytics job that aggregates sales by store and product every minute, writing results to Azure SQL Database. The business now requires a historical trend analysis capability that can query the last three years of sales data with sub-second response times, but the SQL Database is already experiencing performance issues due to high write volume. You need to redesign the serving layer to support both real-time dashboards (seconds latency) and historical analytics (sub-second queries on years of data) without impacting write performance. What should you do?
261Your organization has a data lake on Azure Data Lake Storage Gen2 containing petabytes of raw clickstream data. Data scientists need to run exploratory analysis using Python and Spark, but they are not experienced with cluster management or infrastructure. The IT team wants to minimize administrative overhead while providing a collaborative notebook environment. Additionally, the solution must integrate with Microsoft Purview for data cataloging and lineage. Which Azure service should you recommend?
262A financial services company is building a real-time fraud detection system. Transactions are streamed from multiple sources into Azure Event Hubs. The system must run a trained machine learning model (scored in near real-time) to flag suspicious transactions. The model is a Python pickle file that needs to be deployed as a web service with low latency (under 100 ms per prediction). The data engineering team wants to use a serverless compute option to run the scoring logic, and the solution must integrate with Azure Stream Analytics for alerting. Which Azure service should you use to deploy the model?
The Describe an analytics workload on Azure domain covers the key concepts tested in this area of the DP-900 exam blueprint published by Microsoft. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DP-900 domains — no account required.
The Courseiva DP-900 question bank contains 262 questions in the Describe an analytics workload on Azure domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Describe an analytics workload on Azure domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included