DP-900 domain

Describe an analytics workload on Azure

Use this page to practise DP-900 Describe an analytics workload on Azure practice questions. The goal is not to memorise dumps, but to understand the concept, review the explanation and improve your exam readiness.

124 questions

Focused practice

Start a Describe an analytics workload on Azure session

All sessions draw only from this domain. Pick a length or try interactive practice with inline explanations.

Start 20-question practice session →

What the exam tests

What to know about Describe an analytics workload on Azure

Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.

IaaS, PaaS and SaaS responsibilities and examples.

Public, private, hybrid and community cloud deployment models.

On-premises vs cloud trade-offs: cost, control, scalability.

How cloud connectivity options (VPN, Direct Connect, ExpressRoute) work.

Question index

All Describe an analytics workload on Azure questions (124)

Click any question to see the full explanation, or start a practice session above.

1

A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?

2

A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?

3

A data analytics team stores sales transaction data in Parquet files in Azure Data Lake Storage Gen2. They want to run complex analytical queries that join this data with dimension tables stored in Azure Synapse Analytics dedicated SQL pool. The team prefers not to move or copy the data from the data lake. Which feature should they use to query the data lake data directly?

4

A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?

5

A retail chain collects daily sales data from hundreds of stores. The data is stored as CSV files in Azure Data Lake Storage Gen2. The analytics team needs to run complex SQL queries that join sales data with product dimensions and aggregate results across petabytes of data. Queries must return results within seconds. Which Azure service is best suited for this analytical workload?

6

A financial analytics company has petabytes of transaction data stored as Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple tables and return results within seconds. The company wants to query the data directly without moving it to another store. Which Azure service should they use?

7

A retail company analyzes customer purchase patterns. Every night, they run a batch job that aggregates millions of transactions from the past day into summary tables for reporting. Which type of data processing workload best describes this nightly job?

8

A retail chain captures real-time sales data from point-of-sale (POS) systems as a stream of events. The data is ingested into Azure Event Hubs. Additionally, the company receives daily inventory files in CSV format uploaded to Azure Data Lake Storage Gen2. The analytics team needs to combine the streaming sales data with the batch inventory data to generate near real-time dashboards and run historical reports. They want a single analytics platform that can handle both streaming and batch workloads, and allow querying data directly in the data lake using SQL. Which Azure service should they choose?

9

A marketing team wants to analyze social media sentiment in near real-time. They will use Azure Event Hubs to capture tweets and need to aggregate sentiment scores over 5-minute windows. The aggregated results must be stored in Azure Blob Storage for later analysis. Which Azure service should they use to perform the stream processing?

10

A company is migrating their on-premises data warehouse, which is built on a Netezza appliance, to Azure. The data warehouse contains over 10 terabytes of data and supports complex BI queries with multiple joins and aggregations. The company requires a cloud-based solution that provides massively parallel processing (MPP) to handle large-scale queries efficiently. They also need to integrate with existing ETL tools like Azure Data Factory and provide native connectivity to Power BI. Which Azure service should they choose?

11

A manufacturing company collects sensor data from factory equipment as a continuous stream of events ingested into Azure Event Hubs. Additionally, the company receives daily inventory CSV files uploaded to Azure Data Lake Storage Gen2. The analytics team needs to build near real-time dashboards that combine streaming sensor data with batch inventory data, and also support historical reporting by querying data directly in the data lake using SQL without moving it. Which Azure service should they choose as the primary analytics platform?

12

A retail company runs a nightly process that reads all sales transactions from the previous day, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload best describes this nightly process?

13

A financial analytics company stores petabytes of transaction data in Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for ETL orchestration. They require a massively parallel processing (MPP) engine to handle the scale. Which Azure service should they choose?

14

A retail company runs a nightly job that reads all sales transactions from the previous day from an operational database, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload does this nightly job represent?

15

A financial services company stores years of market trade data as Parquet files in Azure Data Lake Storage Gen2. The data volume is terabytes and growing rapidly. Data analysts need to run complex SQL queries that join multiple tables (e.g., trades, instruments, counterparties) and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for orchestration of ETL pipelines. Which Azure service should they choose as the primary analytics platform?

16

A logistics company receives real-time GPS tracking data from its delivery fleet via Azure Event Hubs. The data is a continuous stream of location updates (vehicle ID, latitude, longitude, timestamp). Additionally, the company has daily static route plan files in CSV format stored in Azure Data Lake Storage Gen2. The operations team needs to combine the live GPS stream with the route plans to create a near real-time dashboard showing if delivery vehicles are on schedule. They also want to run historical queries on both the stream data and route plans using T-SQL, without moving the data to another store. Which Azure service should they use as the primary analytics platform?

17

A manufacturing company ingests a continuous stream of sensor data from factory equipment into Azure Event Hubs. Additionally, historical maintenance data in CSV format is stored in Azure Data Lake Storage Gen2. The analytics team needs to join the streaming sensor data with the historical data in near real-time and enable analysts to query the combined dataset using standard T-SQL without moving the data. Which Azure service should they use as the primary analytics platform?

18

A retail company has an Azure SQL Database that handles OLTP transactions for its e-commerce platform. The analytics team needs to run complex reporting queries that join multiple tables (e.g., orders, products, customers) and aggregate millions of rows. These queries are long-running and would negatively impact the performance of the OLTP database if run directly. The company wants to use a separate analytics service that supports T-SQL queries, can scale compute independently, and provides a serverless option to avoid provisioning fixed resources. Which Azure service should they choose?

19

A manufacturing company connects thousands of IoT sensors on an assembly line, each sending telemetry data every second. The data volume is terabyte-scale per day. The company needs to analyze the sensor data in near real-time to detect anomalies (e.g., temperature spikes) and also allow data scientists to run interactive ad-hoc queries on the historical data to find patterns. They prefer using a query language similar to SQL. Which Azure service should they choose?

20

A financial services company stores petabytes of transaction data in Parquet format in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and aggregate billions of rows, with results expected within seconds. The company wants to use a massively parallel processing (MPP) engine that supports T-SQL and can be paused to reduce costs during off-hours. They also need native integration with Azure Data Factory and Power BI. Which Azure service should they use?

21

A company stores terabytes of web server log data in CSV files in Azure Data Lake Storage Gen2. Data analysts need to run ad-hoc SQL queries on this data to analyze user behavior patterns. The queries are complex, involve joins across multiple files, and the analysts prefer not to move the data into a separate store. Which Azure service should they use?

22

A company ingests streaming data from IoT devices into Azure Event Hubs. They need to perform real-time analytics on the data, such as aggregating temperature readings over 5-minute windows and triggering alerts when thresholds are exceeded. They also want to store the processed data in a data warehouse for historical analysis. Which Azure service should they use for the real-time processing?

23

A manufacturing company ingests real-time sensor data from factory equipment via Azure Event Hubs. The data is a continuous stream of measurements (sensorId, timestamp, value). Additionally, historical maintenance records are stored as CSV files in Azure Data Lake Storage Gen2. The operations team needs to join the streaming data with the historical records in near real-time to detect anomalies. They also need to run complex T-SQL queries on the combined dataset for ad-hoc analysis. Which Azure service should they use as the primary analytics platform?

24

A logistics company ingests real-time GPS data from delivery vehicles via Azure Event Hubs. The data includes vehicle ID, latitude, longitude, and timestamp. The company also has historical route plan data stored as CSV files in Azure Data Lake Storage Gen2. Data analysts need to combine the live stream with the historical data in near real-time to create a dashboard showing if vehicles are on schedule. They also need to run complex T-SQL queries on the combined dataset for ad-hoc reporting. Which Azure service should they use as the primary analytics platform?

25

A company uses Azure Data Factory to run a pipeline that copies new orders from an on-premises SQL Server database to Azure Data Lake Storage every hour. After the data is in the data lake, an Azure Databricks notebook transforms it and loads it into Azure Synapse Analytics for reporting. Which type of data processing does the hourly copy operation represent?

26

A manufacturing company ingests a continuous stream of sensor data from thousands of IoT devices into Azure Event Hubs. The company also stores historical equipment maintenance records in Azure SQL Database. The operations team needs to join the streaming sensor data with the historical maintenance records in near real-time to detect anomalies, and data scientists need to run ad-hoc T-SQL queries on the combined dataset for analysis. Which Azure service should they use as the primary analytics platform to meet both requirements?

27

A company ingests raw clickstream data as JSON files into Azure Data Lake Storage Gen2. Data scientists need to explore the data interactively using Python notebooks, and the BI team needs to create reports from aggregated datasets derived from this data. The solution must be serverless, scale automatically, and minimize administration. Which Azure service should they choose?

28

A retail company wants to analyze customer clickstream data in real-time to detect patterns and trigger personalized offers. They also store the raw clickstream data in Azure Data Lake Storage for later batch analysis. Which Azure service should they use for the real-time processing component?

29

A smart building monitoring company ingests real-time sensor data (temperature, humidity, occupancy) from thousands of IoT devices into Azure Event Hubs. The company also stores historical building blueprints and maintenance records as CSV files in Azure Data Lake Storage Gen2. The engineering team needs to build a dashboard that displays live sensor readings overlaid on building floor plans, and also allows facility managers to run ad-hoc T-SQL queries that combine live sensor data with historical maintenance records. Which Azure service should they use as the primary analytics platform to meet both requirements?

30

A financial services company uses Azure Synapse Analytics to process large volumes of transaction data. They have a dedicated SQL pool (formerly SQL DW) that ingests curated, aggregated data nightly from a data lake. Data analysts need to run ad-hoc, exploratory T-SQL queries on raw transaction data stored as Parquet files in Azure Data Lake Storage Gen2. These queries vary widely in complexity and frequency. The company wants to minimize costs for these ad-hoc queries while still using full T-SQL capabilities. Which approach should they recommend?

31

A company receives daily sales data from multiple retail stores as CSV files that are uploaded to Azure Blob Storage. The data must be cleansed, validated, and aggregated before being loaded into Azure Synapse Analytics for reporting. The transformations involve complex business logic and must run reliably every night. The company wants a service that can orchestrate and execute the entire pipeline with minimal development effort. Which Azure service should they use?

32

A marketing company ingests streaming data from social media feeds into Azure Event Hubs. They want to perform real-time sentiment analysis on the data and store the results in Azure SQL Database for immediate dashboarding. They also need to aggregate the raw data over longer time windows and store it in Azure Data Lake Storage for historical trend analysis. Which combination of Azure services should they use for the two processing paths?

33

A company receives real-time clickstream data from its website via Azure Event Hubs. They need to detect fraudulent clicks within seconds and also produce daily aggregate reports of visitor statistics for historical analysis. Which combination of Azure services should they use for the real-time detection and the daily aggregation, respectively?

34

A retail company receives a continuous stream of customer orders from their website via Azure Event Hubs. They also receive daily inventory updates from suppliers as CSV files uploaded to Azure Blob Storage. The company needs to calculate real-time order fulfillment availability by joining the streaming orders with the latest inventory snapshot. Additionally, they generate nightly sales reports from historical order data. Which Azure service should they use for the real-time processing component?

35

A company is designing an enterprise analytics solution. They store raw data in its original format in a scalable repository, apply schema and transformations at read time, and also maintain a curated layer that enforces ACID transactions for data reliability. This architecture combines the flexibility of a data lake with the reliability of a data warehouse. Which term best describes this modern data architecture?

36

A retail company stores years of historical sales data in Azure Data Lake Storage Gen2 as Parquet files. Business analysts need to run complex SQL queries over this data to identify sales trends, and they want to visualize the results in Power BI dashboards. They prefer to avoid moving data into a separate database to minimize storage costs and latency. Which Azure service should they use to query the data directly in the lake?

37

A financial services company needs to build a data pipeline that ingests daily transaction files from multiple sources. The pipeline must perform data quality checks, transform data using complex business logic, and load it into Azure Synapse Analytics. The transformations involve conditional branching (e.g., if a transaction amount exceeds a threshold, apply additional validation). The company wants to minimize coding effort and prefers a visual, configuration-based approach. Which Azure service should they use as the primary orchestration and transformation engine?

38

A retail company ingests clickstream data from its e-commerce website into Azure Event Hubs. They need to detect customer journey patterns in real time within seconds and also prepare aggregated data for daily trend reports stored in Azure Data Lake Storage Gen2. The real-time processing must handle high throughput and support complex temporal queries like sessionization. The daily aggregation should be cost-effective and use serverless compute. Which combination of Azure services should they use?

39

A marketing company stores years of historical campaign data in Azure Data Lake Storage Gen2 as Parquet files. Data analysts need to run complex SQL queries over this data to identify trends, and they want to visualize results in Power BI dashboards. The company wants to avoid moving data into a separate database to minimize duplication and latency. Which Azure service should they use to query the data directly in the data lake?

40

A marketing company collects real-time clickstream data from their website using Azure Event Hubs. They need to perform two tasks: (1) aggregate the number of clicks per advertising campaign every 5 minutes and display the results in a live dashboard, and (2) run complex historical queries on months of aggregated click data to identify trends. They want to minimize data movement and use serverless compute where possible. Which combination of Azure services should they use?

41

A manufacturing company deploys IoT sensors on equipment in a factory. They need to monitor sensor data in real time to detect anomalies and trigger immediate alerts. They also need to store years of historical sensor data for monthly capacity planning reports that involve complex aggregations. The company wants a cost-effective solution that minimizes data movement between storage and compute. Which combination of Azure services should they use for real-time processing and historical batch analytics?

42

A retail company needs to build an analytics pipeline on Azure. They ingest sales data from multiple store systems and an online e-commerce platform. The data must be cleaned, transformed, and loaded into a data warehouse for reporting. The company wants to use a modern ELT (Extract, Load, Transform) approach where raw data is stored first and then transformed. Order the following steps in the correct sequence for this pipeline. (Drag the steps into the correct order.)

43

A company stores terabytes of historical sales data as Parquet files in Azure Data Lake Storage Gen2. Business analysts need to run ad-hoc SQL queries that involve complex joins and aggregations over this data. They want to avoid provisioning a dedicated cluster or moving data into a separate database. The queries must be executed using standard T-SQL syntax. Which Azure service should they use?

44

A telecommunications company needs to analyze call detail records (CDRs) to detect fraud patterns and minimize revenue leakage. The data arrives as a continuous stream from network switches and must be queried within seconds of ingestion to flag suspicious activity. The analysts also need to run interactive ad-hoc queries over the last 90 days of CDR data using a Kusto query language. Which Azure service should they use as the primary data store and analytics engine?

45

A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?

46

A retail company ingests daily sales data from multiple stores as CSV files stored in Azure Blob Storage. The data must be cleaned and transformed using Spark, then loaded into Azure Synapse Analytics for large-scale reporting. The pipeline must run on a schedule, handle failures with retries, and minimize manual intervention. Which combination of Azure services should they use to orchestrate and execute this pipeline?

47

A retail company needs to analyze clickstream data from their website in real time to detect fraudulent activity and also run complex historical queries on months of data to identify shopping trends. They want a single service that can handle both streaming and batch analytics using a unified query language, minimizing data movement. Which Azure service should they use?

48

A data engineering team needs to transform raw clickstream data stored as Parquet files in Azure Data Lake Storage Gen2. They want to use standard T-SQL queries to perform transformations and aggregations. The team prefers a serverless option to avoid provisioning and managing dedicated compute resources. Which Azure service should they use?

49

A manufacturing company uses IoT sensors to collect temperature and vibration data from machinery. They need to analyze the streaming data in real time to detect anomalies and trigger alerts. Additionally, they need to run complex historical queries on months of sensor data to identify equipment failure patterns. They want a single Azure service that can handle both real-time stream processing and large-scale batch analytics using a unified query language, minimizing the need for separate technologies. Which Azure service should they use?

50

A company wants to build a modern data warehouse using a lakehouse architecture. They need to store raw data in its native format (e.g., CSV, JSON, Parquet) and also support BI reporting on curated, transformed data. They want to use a single storage layer for both raw and curated data. Which Azure service should they use as the core storage layer?

51

A retail company processes petabytes of sales transaction data stored in Azure Data Lake Storage Gen2. They need to run recurring complex queries that involve large joins and aggregations. The queries must consistently complete within a fixed time window overnight. The company wants predictable performance and costs. Which Azure service should they use?

52

A data engineering team is designing a modern data warehouse using Azure Synapse Analytics. They want to follow a lakehouse architecture where raw data is stored in its native format and then processed and curated for reporting. Which component in Azure Synapse Analytics is primarily used to store raw data in its original format without requiring a schema?

53

A financial services company runs large-scale analytical queries on a dedicated SQL pool in Azure Synapse Analytics. They notice that during peak hours, complex aggregations consume excessive resources, causing slower queries from other users. They need to ensure that critical management reports always get enough resources and complete within a guaranteed time, while other less important queries do not starve them. Which feature should they implement?

54

A financial services company uses a dedicated SQL pool in Azure Synapse Analytics to run large-scale analytical queries. During peak hours, complex aggregations consume excessive resources, causing slower performance for other users. The company needs to ensure that critical scheduled management reports always receive guaranteed resources and complete within a predictable timeframe, while less important ad-hoc queries do not interfere. Which feature should they implement to manage query resource allocation?

55

A financial services company runs critical end-of-day reports in an Azure Synapse Analytics dedicated SQL pool. These reports require guaranteed resource allocation and must complete within a fixed time window. However, ad-hoc analytical queries from data scientists often consume resources, causing contention and delaying the critical reports. Which feature should the company implement to ensure the critical reports always receive sufficient resources?

56

A large e-commerce company needs to build an analytics solution. They have streaming clickstream data from their website (JSON) and daily sales data from their transactional database (CSV). They need to perform real-time dashboards on clickstream for the current hour, and also run complex historical queries that join sales data with aggregated clickstream data over the past year. They want a single Azure service that can handle both stream processing and batch processing using a unified experience, without moving data between separate systems. Which Azure service should they use?

57

A company stores weather sensor data in Azure Data Lake Storage Gen2. Data scientists need to run large-scale transformations and machine learning experiments on this data using Python and Apache Spark. They want to collaborate using shared Jupyter notebooks. Which Azure service should they use for this analytical workload?

58

A retail company wants to analyze years of historical sales data stored as CSV files in Azure Blob Storage. The analytics solution must be serverless, allow T-SQL queries without managing infrastructure, and integrate directly with Power BI. Which Azure service should the company use?

59

A business analyst needs to explore and create interactive visualizations of sales data stored in Azure Data Lake Storage Gen2 without writing SQL code. Which Azure service is best suited for this drag-and-drop data exploration?

60

A financial institution needs to run complex queries against petabytes of historical trading data stored in Azure Data Lake Storage. The queries must be efficient and use columnar storage format. Which technology should they use to process this data?

61

A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?

62

A data analyst needs to run interactive SQL queries against petabytes of sales data stored in Parquet format in Azure Data Lake Storage Gen2. The analyst wants the fastest query performance for ad-hoc exploration without provisioning or managing any infrastructure. Which Azure service should they use?

63

A data engineer needs to process raw clickstream data from multiple websites that is stored in Azure Blob Storage as JSON files. The processing must run automatically every hour, transform the data into a structured format for reporting, and handle schema changes in the source data without manual intervention. Which Azure service should be used?

64

A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?

65

A company uses Azure Synapse Analytics to run complex queries against large datasets stored in Parquet files in Azure Data Lake Storage Gen2. They notice that queries scanning entire partitions are slow due to high I/O overhead on the compute nodes. Investigation shows each daily partition contains thousands of small files (under 1 MB each). Which optimization should be implemented first to improve query performance?

66

A data engineer needs to build a pipeline that runs every hour, copies new sales data from an on-premises SQL Server to Azure Data Lake Storage Gen2, transforms the data using PySpark, and then loads it into Azure Synapse Analytics dedicated SQL pool. Which Azure service should be used to orchestrate the entire pipeline?

67

A data analyst needs to run complex SQL queries against petabytes of historical sales data stored in Azure Data Lake Storage Gen2. The solution must be serverless with pay-per-query pricing. Which Azure service should they use?

68

A retail chain needs to blend two data sources for a near real-time dashboard: daily batch files from store systems (CSV files on Azure Blob Storage updated once per day) and live web clickstream data from Azure Event Hubs. The dashboard must refresh every 5 minutes with combined data. Which combination of Azure services should be used to ingest and process both data types most efficiently?

69

A data engineering team needs to analyze petabytes of historical sales data stored in Azure Data Lake Storage Gen2. They require the ability to run complex SQL queries that join multiple tables and need high performance. The solution must separate compute from storage to allow independent scaling of resources. Which Azure service should they use?

70

A financial institution runs complex analytical queries on trading data stored in Parquet files in Azure Data Lake Storage Gen2. The data is partitioned by date and contains billions of rows. Analysts frequently query within a specific date range, and the queries must return results in under 5 seconds. The current solution uses Azure Synapse Serverless SQL pool, but queries are slow because the serverless pool scans all partitions even when the WHERE clause filters on the date column. Which optimization should be implemented to improve query performance?

71

A data analyst needs to run interactive SQL queries on a large dataset stored as CSV files in Azure Blob Storage. The analyst wants to explore the data using T-SQL without loading the data into a database. Which Azure service should they use?

72

A retail company needs to run complex SQL queries on petabytes of historical sales data stored in Parquet files in Azure Data Lake Storage Gen2. They want a solution that provides fast query performance without managing infrastructure, and they prefer a pay-per-query pricing model. Which Azure service should they use?

73

A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?

74

A data engineer needs to build an analytics solution to transform large volumes of streaming data from IoT devices. The transformations involve complex Python and Spark code, and the results will be stored in Azure Data Lake Storage Gen2 for further analysis. Which Azure service is best suited for executing these transformations?

75

A retail company collects streaming clickstream data from its website into Azure Event Hubs. They need to aggregate the data in real-time to count page views per product every minute and store the results in Azure SQL Database for a live dashboard. Which Azure service should they use to perform this real-time aggregation?

76

A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table containing 5 TB of sales transactions. New data arrives continuously and is loaded daily. The company needs to load 500 GB of new data each day while allowing concurrent read queries on the most recent data without performance degradation. Which loading strategy optimizes both load speed and query performance?

77

A business analyst needs to create interactive visualizations and share dashboards with colleagues using data stored in an Azure Synapse Analytics dedicated SQL pool. Which tool should the analyst use?

78

A data analyst needs to run ad-hoc SQL queries on petabytes of log data stored as Parquet files in Azure Data Lake Storage Gen2. The queries join multiple tables and require high concurrency from multiple analysts. The solution should minimize cost by only paying for queries executed. Which Azure service should they use?

79

A company has a data warehouse in Azure Synapse Analytics dedicated SQL pool. They need to load new sales data every night from a CSV file stored in Azure Data Lake Storage Gen2. The load process must be automated, scheduled, and have error handling for failed loads. Which Azure service should they use to orchestrate this process?

80

A manufacturing company collects sensor data from thousands of IoT devices. The data arrives as a stream of time-stamped readings with a fixed schema (DeviceID, Timestamp, Temperature, Pressure, Vibration). They need to store this data and support both real-time dashboards showing the last hour of data and complex analytical queries over years of historical data. The solution must minimize storage costs and provide sub-second response for real-time queries. Which Azure service is best suited for this workload?

81

A retail company needs to analyze streaming clickstream data from their website to detect shopping cart abandonment in real-time. They want to use Azure Stream Analytics to output results that can be visualized on a live dashboard. Which output sink allows the fastest data visualization for a real-time dashboard in Power BI?

82

A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?

83

A company is building a modern data warehouse on Azure using a lakehouse approach. Arrange the following steps in the correct order to implement a typical pipeline that starts with raw data ingestion and ends with business reporting.

84

A company needs to run complex SQL queries on petabytes of data stored in Azure Data Lake Storage Gen2. They want to pay only for the queries they run and do not want to manage any infrastructure. Which Azure service should they use?

85

A retail company receives daily sales data as CSV files in Azure Data Lake Storage Gen2. They need to load this data into an Azure Synapse Analytics dedicated SQL pool every night. The process must be automated, scheduled, and include error handling for failed loads. Which Azure service should they use to orchestrate this pipeline?

86

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every night, they need to load 500 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2. The loading process must be automated, scheduled, and include error handling (e.g., skip corrupt rows and log them). Which Azure service should be used to orchestrate this load pipeline?

87

A data warehouse team uses Azure Synapse Analytics dedicated SQL pool to serve both business executives running weekly reports and data scientists running complex ad-hoc queries on large fact tables. The ad-hoc queries often consume excessive resources and degrade performance for the weekly reports. The team needs to ensure that the weekly reports always get guaranteed resources regardless of other concurrent queries. Which Synapse feature should they use?

88

A manufacturing company needs to build an analytics solution for IoT sensor data. Thousands of devices send real-time temperature and vibration readings. The solution must: (1) ingest the streaming data reliably, (2) perform real-time aggregations (e.g., average temperature per device every minute), and (3) store the aggregated results in Azure Synapse Analytics for historical reporting and dashboards. Which combination of Azure services should be used?

89

A manufacturing company collects real-time temperature data from thousands of IoT sensors. They need to build an analytics solution that processes the streaming data, computes the average temperature per device every minute, and outputs the results to a Power BI dashboard for near real-time visualization. Which Azure service should they use for the real-time stream processing?

90

A company needs to build a centralized analytics platform that can query both structured data in a relational data warehouse and unstructured data in a data lake using a single SQL-based interface. They want to minimize data movement and use a serverless, on-demand compute model for ad-hoc queries. Which Azure service should they use?

91

A transportation company collects real-time GPS data from thousands of delivery vehicles. They need to process this streaming data to detect delays and generate alerts when a vehicle is behind schedule. Which Azure service should they use for the stream processing?

92

A data analyst needs to run ad-hoc SQL queries on large volumes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are unpredictable, and the analyst wants to pay only for the compute resources consumed by each query. Which Azure Synapse Analytics compute model should be used?

93

A manufacturing company installs temperature sensors in a factory. Sensor data is streamed to Azure Event Hubs. The company needs to detect when the average temperature of any sensor exceeds 100°F over a 5-minute sliding window and then send an alert. Which Azure service should be used for this real-time stream processing?

94

A data engineering team needs to build a pipeline that ingests streaming data from IoT devices into Azure Data Lake Storage Gen2. The data arrives as JSON messages. They want to use a service that can capture the streaming data in near real-time and store it as files in the data lake without writing custom code for the ingestion. Which Azure service should they use?

95

A data engineering team needs to build a batch processing pipeline that transforms large volumes of sales data stored in Azure Data Lake Storage Gen2. The transformations include aggregations and joins, and the output should be stored back in the data lake as Parquet files. The team wants a serverless compute option that automatically scales and charges per second. Which Azure service should they use?

96

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every day, they need to incrementally load 100 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The load should use PolyBase for efficient parallel data transfer and must be orchestrated on a recurring schedule. Which Azure service should they use to create and manage this pipeline?

97

A data engineering team wants to build a batch analytics pipeline. The raw data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The final output will be a set of tables in Azure Synapse Analytics (dedicated SQL pool) that will be used to create reports in Power BI. Arrange the following steps in the correct order for a typical ETL process.

98

A data engineering team needs to build a real-time dashboard showing sales totals by region. Sales transactions are streamed from point-of-sale systems into Azure Event Hubs. The team wants to aggregate the data in near real-time (e.g., every minute) and store the results in Azure SQL Database for visualization in Power BI. Which Azure service should they use for the aggregation step?

99

A manufacturing company collects temperature and vibration data from thousands of sensors. The data is streamed to Azure Event Hubs. The company wants to store all this raw data in Azure Data Lake Storage Gen2 for future batch analytics. They need a solution that automatically writes the streaming data to the data lake in near real-time, without requiring any custom code for the write operation. Which Azure feature should they use?

100

A data engineering team needs to build a batch ETL pipeline that transforms large volumes of clickstream data stored as CSV files in Azure Data Lake Storage Gen2. The transformations require running distributed Python and Scala code using Apache Spark. The transformed data will be loaded into a data warehouse for reporting. The team wants a serverless compute environment that automatically scales and charges per second. Which Azure service should they use to run the Spark transformations?

101

A company uses Azure Synapse Analytics dedicated SQL pool as its data warehouse. New data is loaded into the warehouse every few minutes. The company wants to visualize the data with near real-time updates in a dashboard that can be refreshed automatically. Which tool and connection mode should they use?

102

A retail company stores historical sales data from multiple stores in Azure Data Lake Storage Gen2 as CSV files. They need to run complex SQL queries that join and aggregate data across multiple files to generate weekly sales reports. They want a serverless query service that can directly query the data in the lake without loading it into a separate database. Which Azure service should they use?

103

A data analyst needs to run ad-hoc SQL queries on large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the data volume varies. The analyst wants to pay only for the amount of data processed per query and does not want to manage any infrastructure. They also need to create views in T-SQL to simplify queries for Power BI reports. Which Azure service should they use?

104

A financial services company processes real-time stock trade data from multiple exchanges. Trades are ingested into Azure Event Hubs. The company needs to compute a 5-minute sliding window average of trade prices per stock symbol and ensure that each trade is processed exactly once within the window. The aggregated results must be stored in Azure SQL Database for historical reporting and also sent to a Power BI dashboard for near real-time visualization. Which Azure service should be used for the real-time processing?

105

A company needs to ingest data from an on-premises SQL Server database into Azure SQL Database every hour. During the ingestion, they need to filter out rows where Status = 'Inactive' and convert a date column to a different format. They want a cloud-based, code-free solution that can schedule and orchestrate this task. Which Azure service should they use?

106

A retail company needs to analyze sales transactions as they occur to detect fraud patterns and immediately block suspicious orders. They also need to run daily batch reports on historical sales data. Which combination of Azure services should they use to meet both real-time and batch processing requirements?

107

A data engineering team is designing a modern data warehouse on Azure. They have raw data landing in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files. They need to perform transformations using Apache Spark, and then load the transformed data into Azure Synapse Analytics for high-performance analytical queries. The team wants to use a single orchestration service to schedule, monitor, and manage the entire pipeline. Which Azure service should they choose for orchestration?

108

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. They frequently run queries that aggregate sales by product and region over the past month. The queries are slow because they scan the entire table. Which index type should they implement on the fact table to improve query performance for these aggregations?

109

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The fact table contains billions of rows and is hash-distributed on ProductID. Queries aggregate sales by store and product for the current month and join with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries are slow due to data movement during joins. Which design change will most reduce data movement and improve query performance?

110

A data engineering team is building a batch analytics pipeline. Raw clickstream data is stored as Parquet files in Azure Data Lake Storage Gen2. The team needs to transform the data using Apache Spark (Python code) and then load the results into Azure Synapse Analytics for high-performance reporting. They want to use a serverless compute option for Spark to avoid managing clusters. Which combination of Azure services should they use for the transformation and loading?

111

A data analyst needs to query large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are ad-hoc and infrequent. The analyst wants to run SQL queries directly on the data without creating any storage or compute infrastructure, and only pay for the amount of data processed. They also need to create T-SQL views to simplify queries for Power BI reports. Which Azure service should they use?

112

A logistics company needs to analyze GPS data from delivery trucks in real time to detect delays and reroute deliveries. The GPS data is streamed into Azure Event Hubs. They also need to combine this live data with static route information stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

113

A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?

114

A manufacturing company ingests real-time sensor data from assembly line machines into Azure Event Hubs. The company needs to calculate a 5-minute rolling average of temperature readings for each machine and compare it against a static threshold value stored in a CSV file in Azure Blob Storage. If the average exceeds the threshold, an alert must be triggered. Which Azure service should be used for this real-time data processing?

115

A data analyst needs to run ad-hoc SQL queries on terabytes of CSV files stored in Azure Data Lake Storage Gen2. The queries are infrequent and unpredictable. The analyst wants to pay only for the amount of data processed by each query, and does not want to manage any compute or storage infrastructure. Which Azure service should they use?

116

A retail company collects sales data from multiple stores. Data is ingested into Azure Data Lake Storage Gen2 as CSV files. The data team needs to run ad-hoc SQL queries on this data without moving it, and they want to pay only for the amount of data processed. They also need to integrate with Power BI for visualization. Which Azure service should they use?

117

A company uses Azure Synapse Analytics dedicated SQL pool for a large data warehouse. The fact table contains billions of rows and is hash-distributed on ProductID. Frequent queries join this fact table with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries aggregate sales by store and product for recent months, but run slowly due to data movement during joins. Which design change will most reduce data movement and improve query performance?

118

A data engineering team needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Apache Spark with Python code. They want a fully managed service that provides serverless Spark pools, meaning no clusters to manage and automatic scaling. Which Azure service should they use?

119

A data analyst needs to run ad-hoc SQL queries on petabytes of Parquet files stored in Azure Data Lake Storage Gen2. The queries are infrequent and highly selective. The analyst wants to pay only for the data scanned by each query and does not want to provision any compute resources. They also need to create views to simplify future queries for other analysts. Which Azure service should they use?

120

A financial services company stores transaction data in Azure Data Lake Storage Gen2 as Parquet files, partitioned by date. The data volume is 5 TB per day. The analytics team runs ad-hoc SQL queries to detect fraudulent patterns. Queries are highly selective (filtering on AccountID and date range). The team also needs to create external tables and views for use in Power BI. They want to pay only for the data processed by each query and avoid provisioning any compute resources. Which Azure service should they use?

121

A logistics company uses IoT sensors on delivery trucks to transmit GPS location, speed, and engine diagnostics every 10 seconds. The data is ingested into Azure Event Hubs. The company needs to analyze the data in real time to identify speeding trucks and send alerts. The analysis requires joining the live sensor data with a reference table of truck details (e.g., driver name, route number) stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

122

A data analyst needs to run ad-hoc SQL queries on petabytes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent but must return results within seconds. The analyst wants to pay only for the amount of data processed and does not want to manage any compute infrastructure. Additionally, they need to create views to simplify future reporting in Power BI. Which Azure service should they use?

123

A company needs to run complex analytical queries that aggregate terabytes of sales data across multiple years. The queries are used for monthly business reports and are not latency-sensitive. The data is stored in Azure Data Lake Storage Gen2. The company wants a fully managed, petabyte-scale data warehouse solution that supports SQL queries and integrates with Power BI for reporting. Which Azure service should they use?

124

A financial services company has raw transaction data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files, partitioned by date. The analytics team needs to run complex SQL queries that join multiple datasets, including reference data from an Azure SQL Database, to generate risk reports. They require enterprise-grade security features such as row-level security (RLS) and column-level security. They also want to use the same service for data transformation and loading (ETL) into a curated layer. Which Azure service should they choose?

Watch out for

Common Describe an analytics workload on Azure exam traps

  • IaaS gives you infrastructure control; SaaS gives you only the application.
  • Hybrid cloud combines on-premises and public cloud — not two public clouds.
  • Cloud does not automatically mean cheaper or more secure.
  • Management responsibility shifts with each service model (IaaS → PaaS → SaaS).

Frequently asked questions

What does the Describe an analytics workload on Azure domain cover on the DP-900 exam?
Cloud concepts questions usually test the service model (IaaS/PaaS/SaaS) and deployment model (public/private/hybrid/community) appropriate for a given scenario.
How many questions are in this domain?
This page lists all 124 Describe an analytics workload on Azure questions in the DP-900 question bank. The actual exam draws from this domain proportionally to its weighting in the official exam blueprint.
What is the best way to practise this domain?
Start with a short focused session (10 questions) to identify gaps, then use the interactive practice page to work through explanations. Repeat with a longer session once the weak areas feel solid.
Can I practise only Describe an analytics workload on Azure questions?
Yes — the session launcher on this page filters questions to this domain only. Choose any session length or try the interactive practice page for inline explanations.