Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

← Describe an analytics workload on Azure practice sets

DP-900 Describe an analytics workload on Azure • Complete Question Bank

DP-900 Describe an analytics workload on Azure — All Questions With Answers

Complete DP-900 Describe an analytics workload on Azure question bank — all 0 questions with answers and detailed explanations.

262
Questions
Free
No signup
Certifications/DP-900/Practice Test/Describe an analytics workload on Azure/All Questions
Question 1hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?

Question 2hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?

Question 3mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analytics team stores sales transaction data in Parquet files in Azure Data Lake Storage Gen2. They want to run complex analytical queries that join this data with dimension tables stored in Azure Synapse Analytics dedicated SQL pool. The team prefers not to move or copy the data from the data lake. Which feature should they use to query the data lake data directly?

Question 4hardmultiple choice
Read the full NAT/PAT explanation →

A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?

Question 5mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail chain collects daily sales data from hundreds of stores. The data is stored as CSV files in Azure Data Lake Storage Gen2. The analytics team needs to run complex SQL queries that join sales data with product dimensions and aggregate results across petabytes of data. Queries must return results within seconds. Which Azure service is best suited for this analytical workload?

Question 6hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial analytics company has petabytes of transaction data stored as Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple tables and return results within seconds. The company wants to query the data directly without moving it to another store. Which Azure service should they use?

Question 7mediummultiple choice
Read the full NAT/PAT explanation →

A retail company analyzes customer purchase patterns. Every night, they run a batch job that aggregates millions of transactions from the past day into summary tables for reporting. Which type of data processing workload best describes this nightly job?

Question 8hardmultiple choice
Read the full Describe an analytics workload on explanation →

A retail chain captures real-time sales data from point-of-sale (POS) systems as a stream of events. The data is ingested into Azure Event Hubs. Additionally, the company receives daily inventory files in CSV format uploaded to Azure Data Lake Storage Gen2. The analytics team needs to combine the streaming sales data with the batch inventory data to generate near real-time dashboards and run historical reports. They want a single analytics platform that can handle both streaming and batch workloads, and allow querying data directly in the data lake using SQL. Which Azure service should they choose?

Question 9mediummultiple choice
Read the full Describe an analytics workload on explanation →

A marketing team wants to analyze social media sentiment in near real-time. They will use Azure Event Hubs to capture tweets and need to aggregate sentiment scores over 5-minute windows. The aggregated results must be stored in Azure Blob Storage for later analysis. Which Azure service should they use to perform the stream processing?

Question 10hardmultiple choice
Read the full NAT/PAT explanation →

A company is migrating their on-premises data warehouse, which is built on a Netezza appliance, to Azure. The data warehouse contains over 10 terabytes of data and supports complex BI queries with multiple joins and aggregations. The company requires a cloud-based solution that provides massively parallel processing (MPP) to handle large-scale queries efficiently. They also need to integrate with existing ETL tools like Azure Data Factory and provide native connectivity to Power BI. Which Azure service should they choose?

Question 11hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company collects sensor data from factory equipment as a continuous stream of events ingested into Azure Event Hubs. Additionally, the company receives daily inventory CSV files uploaded to Azure Data Lake Storage Gen2. The analytics team needs to build near real-time dashboards that combine streaming sensor data with batch inventory data, and also support historical reporting by querying data directly in the data lake using SQL without moving it. Which Azure service should they choose as the primary analytics platform?

Question 12easymultiple choice
Read the full Describe an analytics workload on explanation →

A retail company runs a nightly process that reads all sales transactions from the previous day, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload best describes this nightly process?

Question 13hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial analytics company stores petabytes of transaction data in Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for ETL orchestration. They require a massively parallel processing (MPP) engine to handle the scale. Which Azure service should they choose?

Question 14easymultiple choice
Read the full Describe an analytics workload on explanation →

A retail company runs a nightly job that reads all sales transactions from the previous day from an operational database, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload does this nightly job represent?

Question 15hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company stores years of market trade data as Parquet files in Azure Data Lake Storage Gen2. The data volume is terabytes and growing rapidly. Data analysts need to run complex SQL queries that join multiple tables (e.g., trades, instruments, counterparties) and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for orchestration of ETL pipelines. Which Azure service should they choose as the primary analytics platform?

Question 16mediummultiple choice
Review the full routing breakdown →

A logistics company receives real-time GPS tracking data from its delivery fleet via Azure Event Hubs. The data is a continuous stream of location updates (vehicle ID, latitude, longitude, timestamp). Additionally, the company has daily static route plan files in CSV format stored in Azure Data Lake Storage Gen2. The operations team needs to combine the live GPS stream with the route plans to create a near real-time dashboard showing if delivery vehicles are on schedule. They also want to run historical queries on both the stream data and route plans using T-SQL, without moving the data to another store. Which Azure service should they use as the primary analytics platform?

Question 17hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company ingests a continuous stream of sensor data from factory equipment into Azure Event Hubs. Additionally, historical maintenance data in CSV format is stored in Azure Data Lake Storage Gen2. The analytics team needs to join the streaming sensor data with the historical data in near real-time and enable analysts to query the combined dataset using standard T-SQL without moving the data. Which Azure service should they use as the primary analytics platform?

Question 18hardmultiple choice
Read the full Describe an analytics workload on explanation →

A retail company has an Azure SQL Database that handles OLTP transactions for its e-commerce platform. The analytics team needs to run complex reporting queries that join multiple tables (e.g., orders, products, customers) and aggregate millions of rows. These queries are long-running and would negatively impact the performance of the OLTP database if run directly. The company wants to use a separate analytics service that supports T-SQL queries, can scale compute independently, and provides a serverless option to avoid provisioning fixed resources. Which Azure service should they choose?

Question 19hardmultiple choice
Read the full network assurance explanation →

A manufacturing company connects thousands of IoT sensors on an assembly line, each sending telemetry data every second. The data volume is terabyte-scale per day. The company needs to analyze the sensor data in near real-time to detect anomalies (e.g., temperature spikes) and also allow data scientists to run interactive ad-hoc queries on the historical data to find patterns. They prefer using a query language similar to SQL. Which Azure service should they choose?

Question 20mediummultiple choice
Read the full NAT/PAT explanation →

A financial services company stores petabytes of transaction data in Parquet format in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and aggregate billions of rows, with results expected within seconds. The company wants to use a massively parallel processing (MPP) engine that supports T-SQL and can be paused to reduce costs during off-hours. They also need native integration with Azure Data Factory and Power BI. Which Azure service should they use?

Question 21mediummultiple choice
Read the full NAT/PAT explanation →

A company stores terabytes of web server log data in CSV files in Azure Data Lake Storage Gen2. Data analysts need to run ad-hoc SQL queries on this data to analyze user behavior patterns. The queries are complex, involve joins across multiple files, and the analysts prefer not to move the data into a separate store. Which Azure service should they use?

Question 22hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company ingests streaming data from IoT devices into Azure Event Hubs. They need to perform real-time analytics on the data, such as aggregating temperature readings over 5-minute windows and triggering alerts when thresholds are exceeded. They also want to store the processed data in a data warehouse for historical analysis. Which Azure service should they use for the real-time processing?

Question 23hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company ingests real-time sensor data from factory equipment via Azure Event Hubs. The data is a continuous stream of measurements (sensorId, timestamp, value). Additionally, historical maintenance records are stored as CSV files in Azure Data Lake Storage Gen2. The operations team needs to join the streaming data with the historical records in near real-time to detect anomalies. They also need to run complex T-SQL queries on the combined dataset for ad-hoc analysis. Which Azure service should they use as the primary analytics platform?

Question 24hardmultiple choice
Review the full routing breakdown →

A logistics company ingests real-time GPS data from delivery vehicles via Azure Event Hubs. The data includes vehicle ID, latitude, longitude, and timestamp. The company also has historical route plan data stored as CSV files in Azure Data Lake Storage Gen2. Data analysts need to combine the live stream with the historical data in near real-time to create a dashboard showing if vehicles are on schedule. They also need to run complex T-SQL queries on the combined dataset for ad-hoc reporting. Which Azure service should they use as the primary analytics platform?

Question 25mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Data Factory to run a pipeline that copies new orders from an on-premises SQL Server database to Azure Data Lake Storage every hour. After the data is in the data lake, an Azure Databricks notebook transforms it and loads it into Azure Synapse Analytics for reporting. Which type of data processing does the hourly copy operation represent?

Question 26hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company ingests a continuous stream of sensor data from thousands of IoT devices into Azure Event Hubs. The company also stores historical equipment maintenance records in Azure SQL Database. The operations team needs to join the streaming sensor data with the historical maintenance records in near real-time to detect anomalies, and data scientists need to run ad-hoc T-SQL queries on the combined dataset for analysis. Which Azure service should they use as the primary analytics platform to meet both requirements?

Question 27hardmultiple choice
Study the full Python automation breakdown →

A company ingests raw clickstream data as JSON files into Azure Data Lake Storage Gen2. Data scientists need to explore the data interactively using Python notebooks, and the BI team needs to create reports from aggregated datasets derived from this data. The solution must be serverless, scale automatically, and minimize administration. Which Azure service should they choose?

Question 28mediummultiple choice
Read the full NAT/PAT explanation →

A retail company wants to analyze customer clickstream data in real-time to detect patterns and trigger personalized offers. They also store the raw clickstream data in Azure Data Lake Storage for later batch analysis. Which Azure service should they use for the real-time processing component?

Question 29mediummultiple choice
Read the full Describe an analytics workload on explanation →

A smart building monitoring company ingests real-time sensor data (temperature, humidity, occupancy) from thousands of IoT devices into Azure Event Hubs. The company also stores historical building blueprints and maintenance records as CSV files in Azure Data Lake Storage Gen2. The engineering team needs to build a dashboard that displays live sensor readings overlaid on building floor plans, and also allows facility managers to run ad-hoc T-SQL queries that combine live sensor data with historical maintenance records. Which Azure service should they use as the primary analytics platform to meet both requirements?

Question 30hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company uses Azure Synapse Analytics to process large volumes of transaction data. They have a dedicated SQL pool (formerly SQL DW) that ingests curated, aggregated data nightly from a data lake. Data analysts need to run ad-hoc, exploratory T-SQL queries on raw transaction data stored as Parquet files in Azure Data Lake Storage Gen2. These queries vary widely in complexity and frequency. The company wants to minimize costs for these ad-hoc queries while still using full T-SQL capabilities. Which approach should they recommend?

Question 31mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company receives daily sales data from multiple retail stores as CSV files that are uploaded to Azure Blob Storage. The data must be cleansed, validated, and aggregated before being loaded into Azure Synapse Analytics for reporting. The transformations involve complex business logic and must run reliably every night. The company wants a service that can orchestrate and execute the entire pipeline with minimal development effort. Which Azure service should they use?

Question 32mediummultiple choice
Read the full NAT/PAT explanation →

A marketing company ingests streaming data from social media feeds into Azure Event Hubs. They want to perform real-time sentiment analysis on the data and store the results in Azure SQL Database for immediate dashboarding. They also need to aggregate the raw data over longer time windows and store it in Azure Data Lake Storage for historical trend analysis. Which combination of Azure services should they use for the two processing paths?

Question 33mediummultiple choice
Read the full NAT/PAT explanation →

A company receives real-time clickstream data from its website via Azure Event Hubs. They need to detect fraudulent clicks within seconds and also produce daily aggregate reports of visitor statistics for historical analysis. Which combination of Azure services should they use for the real-time detection and the daily aggregation, respectively?

Question 34easymultiple choice
Read the full Describe an analytics workload on explanation →

A retail company receives a continuous stream of customer orders from their website via Azure Event Hubs. They also receive daily inventory updates from suppliers as CSV files uploaded to Azure Blob Storage. The company needs to calculate real-time order fulfillment availability by joining the streaming orders with the latest inventory snapshot. Additionally, they generate nightly sales reports from historical order data. Which Azure service should they use for the real-time processing component?

Question 35hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company is designing an enterprise analytics solution. They store raw data in its original format in a scalable repository, apply schema and transformations at read time, and also maintain a curated layer that enforces ACID transactions for data reliability. This architecture combines the flexibility of a data lake with the reliability of a data warehouse. Which term best describes this modern data architecture?

Question 36easymultiple choice
Read the full Describe an analytics workload on explanation →

A retail company stores years of historical sales data in Azure Data Lake Storage Gen2 as Parquet files. Business analysts need to run complex SQL queries over this data to identify sales trends, and they want to visualize the results in Power BI dashboards. They prefer to avoid moving data into a separate database to minimize storage costs and latency. Which Azure service should they use to query the data directly in the lake?

Question 37hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company needs to build a data pipeline that ingests daily transaction files from multiple sources. The pipeline must perform data quality checks, transform data using complex business logic, and load it into Azure Synapse Analytics. The transformations involve conditional branching (e.g., if a transaction amount exceeds a threshold, apply additional validation). The company wants to minimize coding effort and prefers a visual, configuration-based approach. Which Azure service should they use as the primary orchestration and transformation engine?

Question 38hardmultiple choice
Read the full NAT/PAT explanation →

A retail company ingests clickstream data from its e-commerce website into Azure Event Hubs. They need to detect customer journey patterns in real time within seconds and also prepare aggregated data for daily trend reports stored in Azure Data Lake Storage Gen2. The real-time processing must handle high throughput and support complex temporal queries like sessionization. The daily aggregation should be cost-effective and use serverless compute. Which combination of Azure services should they use?

Question 39hardmultiple choice
Read the full Describe an analytics workload on explanation →

A marketing company stores years of historical campaign data in Azure Data Lake Storage Gen2 as Parquet files. Data analysts need to run complex SQL queries over this data to identify trends, and they want to visualize results in Power BI dashboards. The company wants to avoid moving data into a separate database to minimize duplication and latency. Which Azure service should they use to query the data directly in the data lake?

Question 40mediummultiple choice
Read the full NAT/PAT explanation →

A marketing company collects real-time clickstream data from their website using Azure Event Hubs. They need to perform two tasks: (1) aggregate the number of clicks per advertising campaign every 5 minutes and display the results in a live dashboard, and (2) run complex historical queries on months of aggregated click data to identify trends. They want to minimize data movement and use serverless compute where possible. Which combination of Azure services should they use?

Question 41mediummultiple choice
Read the full NAT/PAT explanation →

A manufacturing company deploys IoT sensors on equipment in a factory. They need to monitor sensor data in real time to detect anomalies and trigger immediate alerts. They also need to store years of historical sensor data for monthly capacity planning reports that involve complex aggregations. The company wants a cost-effective solution that minimizes data movement between storage and compute. Which combination of Azure services should they use for real-time processing and historical batch analytics?

Question 42harddrag order
Read the full Describe an analytics workload on explanation →

A retail company needs to build an analytics pipeline on Azure. They ingest sales data from multiple store systems and an online e-commerce platform. The data must be cleaned, transformed, and loaded into a data warehouse for reporting. The company wants to use a modern ELT (Extract, Load, Transform) approach where raw data is stored first and then transformed. Order the following steps in the correct sequence for this pipeline. (Drag the steps into the correct order.)

Question 43hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company stores terabytes of historical sales data as Parquet files in Azure Data Lake Storage Gen2. Business analysts need to run ad-hoc SQL queries that involve complex joins and aggregations over this data. They want to avoid provisioning a dedicated cluster or moving data into a separate database. The queries must be executed using standard T-SQL syntax. Which Azure service should they use?

Question 44mediummultiple choice
Read the full NAT/PAT explanation →

A telecommunications company needs to analyze call detail records (CDRs) to detect fraud patterns and minimize revenue leakage. The data arrives as a continuous stream from network switches and must be queried within seconds of ingestion to flag suspicious activity. The analysts also need to run interactive ad-hoc queries over the last 90 days of CDR data using a Kusto query language. Which Azure service should they use as the primary data store and analytics engine?

Question 45hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?

Question 46hardmultiple choice
Read the full NAT/PAT explanation →

A retail company ingests daily sales data from multiple stores as CSV files stored in Azure Blob Storage. The data must be cleaned and transformed using Spark, then loaded into Azure Synapse Analytics for large-scale reporting. The pipeline must run on a schedule, handle failures with retries, and minimize manual intervention. Which combination of Azure services should they use to orchestrate and execute this pipeline?

Question 47mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company needs to analyze clickstream data from their website in real time to detect fraudulent activity and also run complex historical queries on months of data to identify shopping trends. They want a single service that can handle both streaming and batch analytics using a unified query language, minimizing data movement. Which Azure service should they use?

Question 48mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team needs to transform raw clickstream data stored as Parquet files in Azure Data Lake Storage Gen2. They want to use standard T-SQL queries to perform transformations and aggregations. The team prefers a serverless option to avoid provisioning and managing dedicated compute resources. Which Azure service should they use?

Question 49mediummultiple choice
Read the full NAT/PAT explanation →

A manufacturing company uses IoT sensors to collect temperature and vibration data from machinery. They need to analyze the streaming data in real time to detect anomalies and trigger alerts. Additionally, they need to run complex historical queries on months of sensor data to identify equipment failure patterns. They want a single Azure service that can handle both real-time stream processing and large-scale batch analytics using a unified query language, minimizing the need for separate technologies. Which Azure service should they use?

Question 50mediummultiple choice
Read the full NAT/PAT explanation →

A company wants to build a modern data warehouse using a lakehouse architecture. They need to store raw data in its native format (e.g., CSV, JSON, Parquet) and also support BI reporting on curated, transformed data. They want to use a single storage layer for both raw and curated data. Which Azure service should they use as the core storage layer?

Question 51hardmultiple choice
Read the full Describe an analytics workload on explanation →

A retail company processes petabytes of sales transaction data stored in Azure Data Lake Storage Gen2. They need to run recurring complex queries that involve large joins and aggregations. The queries must consistently complete within a fixed time window overnight. The company wants predictable performance and costs. Which Azure service should they use?

Question 52mediummultiple choice
Read the full NAT/PAT explanation →

A data engineering team is designing a modern data warehouse using Azure Synapse Analytics. They want to follow a lakehouse architecture where raw data is stored in its native format and then processed and curated for reporting. Which component in Azure Synapse Analytics is primarily used to store raw data in its original format without requiring a schema?

Question 53hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company runs large-scale analytical queries on a dedicated SQL pool in Azure Synapse Analytics. They notice that during peak hours, complex aggregations consume excessive resources, causing slower queries from other users. They need to ensure that critical management reports always get enough resources and complete within a guaranteed time, while other less important queries do not starve them. Which feature should they implement?

Question 54hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company uses a dedicated SQL pool in Azure Synapse Analytics to run large-scale analytical queries. During peak hours, complex aggregations consume excessive resources, causing slower performance for other users. The company needs to ensure that critical scheduled management reports always receive guaranteed resources and complete within a predictable timeframe, while less important ad-hoc queries do not interfere. Which feature should they implement to manage query resource allocation?

Question 55hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company runs critical end-of-day reports in an Azure Synapse Analytics dedicated SQL pool. These reports require guaranteed resource allocation and must complete within a fixed time window. However, ad-hoc analytical queries from data scientists often consume resources, causing contention and delaying the critical reports. Which feature should the company implement to ensure the critical reports always receive sufficient resources?

Question 56hardmultiple choice
Read the full Describe an analytics workload on explanation →

A large e-commerce company needs to build an analytics solution. They have streaming clickstream data from their website (JSON) and daily sales data from their transactional database (CSV). They need to perform real-time dashboards on clickstream for the current hour, and also run complex historical queries that join sales data with aggregated clickstream data over the past year. They want a single Azure service that can handle both stream processing and batch processing using a unified experience, without moving data between separate systems. Which Azure service should they use?

Question 57easymultiple choice
Study the full Python automation breakdown →

A company stores weather sensor data in Azure Data Lake Storage Gen2. Data scientists need to run large-scale transformations and machine learning experiments on this data using Python and Apache Spark. They want to collaborate using shared Jupyter notebooks. Which Azure service should they use for this analytical workload?

Question 58mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company wants to analyze years of historical sales data stored as CSV files in Azure Blob Storage. The analytics solution must be serverless, allow T-SQL queries without managing infrastructure, and integrate directly with Power BI. Which Azure service should the company use?

Question 59easymultiple choice
Read the full Describe an analytics workload on explanation →

A business analyst needs to explore and create interactive visualizations of sales data stored in Azure Data Lake Storage Gen2 without writing SQL code. Which Azure service is best suited for this drag-and-drop data exploration?

Question 60hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial institution needs to run complex queries against petabytes of historical trading data stored in Azure Data Lake Storage. The queries must be efficient and use columnar storage format. Which technology should they use to process this data?

Question 61mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?

Question 62mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run interactive SQL queries against petabytes of sales data stored in Parquet format in Azure Data Lake Storage Gen2. The analyst wants the fastest query performance for ad-hoc exploration without provisioning or managing any infrastructure. Which Azure service should they use?

Question 63mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to process raw clickstream data from multiple websites that is stored in Azure Blob Storage as JSON files. The processing must run automatically every hour, transform the data into a structured format for reporting, and handle schema changes in the source data without manual intervention. Which Azure service should be used?

Question 64mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?

Question 65hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run complex queries against large datasets stored in Parquet files in Azure Data Lake Storage Gen2. They notice that queries scanning entire partitions are slow due to high I/O overhead on the compute nodes. Investigation shows each daily partition contains thousands of small files (under 1 MB each). Which optimization should be implemented first to improve query performance?

Question 66mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to build a pipeline that runs every hour, copies new sales data from an on-premises SQL Server to Azure Data Lake Storage Gen2, transforms the data using PySpark, and then loads it into Azure Synapse Analytics dedicated SQL pool. Which Azure service should be used to orchestrate the entire pipeline?

Question 67mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run complex SQL queries against petabytes of historical sales data stored in Azure Data Lake Storage Gen2. The solution must be serverless with pay-per-query pricing. Which Azure service should they use?

Question 68hardmultiple choice
Read the full NAT/PAT explanation →

A retail chain needs to blend two data sources for a near real-time dashboard: daily batch files from store systems (CSV files on Azure Blob Storage updated once per day) and live web clickstream data from Azure Event Hubs. The dashboard must refresh every 5 minutes with combined data. Which combination of Azure services should be used to ingest and process both data types most efficiently?

Question 69mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team needs to analyze petabytes of historical sales data stored in Azure Data Lake Storage Gen2. They require the ability to run complex SQL queries that join multiple tables and need high performance. The solution must separate compute from storage to allow independent scaling of resources. Which Azure service should they use?

Question 70hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial institution runs complex analytical queries on trading data stored in Parquet files in Azure Data Lake Storage Gen2. The data is partitioned by date and contains billions of rows. Analysts frequently query within a specific date range, and the queries must return results in under 5 seconds. The current solution uses Azure Synapse Serverless SQL pool, but queries are slow because the serverless pool scans all partitions even when the WHERE clause filters on the date column. Which optimization should be implemented to improve query performance?

Question 71mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run interactive SQL queries on a large dataset stored as CSV files in Azure Blob Storage. The analyst wants to explore the data using T-SQL without loading the data into a database. Which Azure service should they use?

Question 72mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company needs to run complex SQL queries on petabytes of historical sales data stored in Parquet files in Azure Data Lake Storage Gen2. They want a solution that provides fast query performance without managing infrastructure, and they prefer a pay-per-query pricing model. Which Azure service should they use?

Question 73hardmultiple choice
Read the full Describe an analytics workload on explanation →

A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?

Question 74mediummultiple choice
Study the full Python automation breakdown →

A data engineer needs to build an analytics solution to transform large volumes of streaming data from IoT devices. The transformations involve complex Python and Spark code, and the results will be stored in Azure Data Lake Storage Gen2 for further analysis. Which Azure service is best suited for executing these transformations?

Question 75mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company collects streaming clickstream data from its website into Azure Event Hubs. They need to aggregate the data in real-time to count page views per product every minute and store the results in Azure SQL Database for a live dashboard. Which Azure service should they use to perform this real-time aggregation?

Question 76hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table containing 5 TB of sales transactions. New data arrives continuously and is loaded daily. The company needs to load 500 GB of new data each day while allowing concurrent read queries on the most recent data without performance degradation. Which loading strategy optimizes both load speed and query performance?

Question 77easymultiple choice
Read the full Describe an analytics workload on explanation →

A business analyst needs to create interactive visualizations and share dashboards with colleagues using data stored in an Azure Synapse Analytics dedicated SQL pool. Which tool should the analyst use?

Question 78hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on petabytes of log data stored as Parquet files in Azure Data Lake Storage Gen2. The queries join multiple tables and require high concurrency from multiple analysts. The solution should minimize cost by only paying for queries executed. Which Azure service should they use?

Question 79mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company has a data warehouse in Azure Synapse Analytics dedicated SQL pool. They need to load new sales data every night from a CSV file stored in Azure Data Lake Storage Gen2. The load process must be automated, scheduled, and have error handling for failed loads. Which Azure service should they use to orchestrate this process?

Question 80hardmultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company collects sensor data from thousands of IoT devices. The data arrives as a stream of time-stamped readings with a fixed schema (DeviceID, Timestamp, Temperature, Pressure, Vibration). They need to store this data and support both real-time dashboards showing the last hour of data and complex analytical queries over years of historical data. The solution must minimize storage costs and provide sub-second response for real-time queries. Which Azure service is best suited for this workload?

Question 81mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company needs to analyze streaming clickstream data from their website to detect shopping cart abandonment in real-time. They want to use Azure Stream Analytics to output results that can be visualized on a live dashboard. Which output sink allows the fastest data visualization for a real-time dashboard in Power BI?

Question 82mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?

Question 83harddrag order
Read the full Describe an analytics workload on explanation →

A company is building a modern data warehouse on Azure using a lakehouse approach. Arrange the following steps in the correct order to implement a typical pipeline that starts with raw data ingestion and ends with business reporting.

Question 84easymultiple choice
Read the full Describe an analytics workload on explanation →

A company needs to run complex SQL queries on petabytes of data stored in Azure Data Lake Storage Gen2. They want to pay only for the queries they run and do not want to manage any infrastructure. Which Azure service should they use?

Question 85mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company receives daily sales data as CSV files in Azure Data Lake Storage Gen2. They need to load this data into an Azure Synapse Analytics dedicated SQL pool every night. The process must be automated, scheduled, and include error handling for failed loads. Which Azure service should they use to orchestrate this pipeline?

Question 86mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every night, they need to load 500 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2. The loading process must be automated, scheduled, and include error handling (e.g., skip corrupt rows and log them). Which Azure service should be used to orchestrate this load pipeline?

Question 87hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data warehouse team uses Azure Synapse Analytics dedicated SQL pool to serve both business executives running weekly reports and data scientists running complex ad-hoc queries on large fact tables. The ad-hoc queries often consume excessive resources and degrade performance for the weekly reports. The team needs to ensure that the weekly reports always get guaranteed resources regardless of other concurrent queries. Which Synapse feature should they use?

Question 88mediummultiple choice
Read the full NAT/PAT explanation →

A manufacturing company needs to build an analytics solution for IoT sensor data. Thousands of devices send real-time temperature and vibration readings. The solution must: (1) ingest the streaming data reliably, (2) perform real-time aggregations (e.g., average temperature per device every minute), and (3) store the aggregated results in Azure Synapse Analytics for historical reporting and dashboards. Which combination of Azure services should be used?

Question 89mediummultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company collects real-time temperature data from thousands of IoT sensors. They need to build an analytics solution that processes the streaming data, computes the average temperature per device every minute, and outputs the results to a Power BI dashboard for near real-time visualization. Which Azure service should they use for the real-time stream processing?

Question 90mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company needs to build a centralized analytics platform that can query both structured data in a relational data warehouse and unstructured data in a data lake using a single SQL-based interface. They want to minimize data movement and use a serverless, on-demand compute model for ad-hoc queries. Which Azure service should they use?

Question 91easymultiple choice
Read the full Describe an analytics workload on explanation →

A transportation company collects real-time GPS data from thousands of delivery vehicles. They need to process this streaming data to detect delays and generate alerts when a vehicle is behind schedule. Which Azure service should they use for the stream processing?

Question 92hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on large volumes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are unpredictable, and the analyst wants to pay only for the compute resources consumed by each query. Which Azure Synapse Analytics compute model should be used?

Question 93mediummultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company installs temperature sensors in a factory. Sensor data is streamed to Azure Event Hubs. The company needs to detect when the average temperature of any sensor exceeds 100°F over a 5-minute sliding window and then send an alert. Which Azure service should be used for this real-time stream processing?

Question 94mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team needs to build a pipeline that ingests streaming data from IoT devices into Azure Data Lake Storage Gen2. The data arrives as JSON messages. They want to use a service that can capture the streaming data in near real-time and store it as files in the data lake without writing custom code for the ingestion. Which Azure service should they use?

Question 95mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team needs to build a batch processing pipeline that transforms large volumes of sales data stored in Azure Data Lake Storage Gen2. The transformations include aggregations and joins, and the output should be stored back in the data lake as Parquet files. The team wants a serverless compute option that automatically scales and charges per second. Which Azure service should they use?

Question 96mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every day, they need to incrementally load 100 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The load should use PolyBase for efficient parallel data transfer and must be orchestrated on a recurring schedule. Which Azure service should they use to create and manage this pipeline?

Question 97mediumdrag order
Read the full Describe an analytics workload on explanation →

A data engineering team wants to build a batch analytics pipeline. The raw data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The final output will be a set of tables in Azure Synapse Analytics (dedicated SQL pool) that will be used to create reports in Power BI. Arrange the following steps in the correct order for a typical ETL process.

Question 98mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team needs to build a real-time dashboard showing sales totals by region. Sales transactions are streamed from point-of-sale systems into Azure Event Hubs. The team wants to aggregate the data in near real-time (e.g., every minute) and store the results in Azure SQL Database for visualization in Power BI. Which Azure service should they use for the aggregation step?

Question 99mediummultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company collects temperature and vibration data from thousands of sensors. The data is streamed to Azure Event Hubs. The company wants to store all this raw data in Azure Data Lake Storage Gen2 for future batch analytics. They need a solution that automatically writes the streaming data to the data lake in near real-time, without requiring any custom code for the write operation. Which Azure feature should they use?

Question 100mediummultiple choice
Study the full Python automation breakdown →

A data engineering team needs to build a batch ETL pipeline that transforms large volumes of clickstream data stored as CSV files in Azure Data Lake Storage Gen2. The transformations require running distributed Python and Scala code using Apache Spark. The transformed data will be loaded into a data warehouse for reporting. The team wants a serverless compute environment that automatically scales and charges per second. Which Azure service should they use to run the Spark transformations?

Question 101mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool as its data warehouse. New data is loaded into the warehouse every few minutes. The company wants to visualize the data with near real-time updates in a dashboard that can be refreshed automatically. Which tool and connection mode should they use?

Question 102mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company stores historical sales data from multiple stores in Azure Data Lake Storage Gen2 as CSV files. They need to run complex SQL queries that join and aggregate data across multiple files to generate weekly sales reports. They want a serverless query service that can directly query the data in the lake without loading it into a separate database. Which Azure service should they use?

Question 103hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the data volume varies. The analyst wants to pay only for the amount of data processed per query and does not want to manage any infrastructure. They also need to create views in T-SQL to simplify queries for Power BI reports. Which Azure service should they use?

Question 104hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company processes real-time stock trade data from multiple exchanges. Trades are ingested into Azure Event Hubs. The company needs to compute a 5-minute sliding window average of trade prices per stock symbol and ensure that each trade is processed exactly once within the window. The aggregated results must be stored in Azure SQL Database for historical reporting and also sent to a Power BI dashboard for near real-time visualization. Which Azure service should be used for the real-time processing?

Question 105mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company needs to ingest data from an on-premises SQL Server database into Azure SQL Database every hour. During the ingestion, they need to filter out rows where Status = 'Inactive' and convert a date column to a different format. They want a cloud-based, code-free solution that can schedule and orchestrate this task. Which Azure service should they use?

Question 106mediummultiple choice
Read the full NAT/PAT explanation →

A retail company needs to analyze sales transactions as they occur to detect fraud patterns and immediately block suspicious orders. They also need to run daily batch reports on historical sales data. Which combination of Azure services should they use to meet both real-time and batch processing requirements?

Question 107hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team is designing a modern data warehouse on Azure. They have raw data landing in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files. They need to perform transformations using Apache Spark, and then load the transformed data into Azure Synapse Analytics for high-performance analytical queries. The team wants to use a single orchestration service to schedule, monitor, and manage the entire pipeline. Which Azure service should they choose for orchestration?

Question 108mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. They frequently run queries that aggregate sales by product and region over the past month. The queries are slow because they scan the entire table. Which index type should they implement on the fact table to improve query performance for these aggregations?

Question 109hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The fact table contains billions of rows and is hash-distributed on ProductID. Queries aggregate sales by store and product for the current month and join with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries are slow due to data movement during joins. Which design change will most reduce data movement and improve query performance?

Question 110hardmultiple choice
Study the full Python automation breakdown →

A data engineering team is building a batch analytics pipeline. Raw clickstream data is stored as Parquet files in Azure Data Lake Storage Gen2. The team needs to transform the data using Apache Spark (Python code) and then load the results into Azure Synapse Analytics for high-performance reporting. They want to use a serverless compute option for Spark to avoid managing clusters. Which combination of Azure services should they use for the transformation and loading?

Question 111mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to query large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are ad-hoc and infrequent. The analyst wants to run SQL queries directly on the data without creating any storage or compute infrastructure, and only pay for the amount of data processed. They also need to create T-SQL views to simplify queries for Power BI reports. Which Azure service should they use?

Question 112mediummultiple choice
Review the full routing breakdown →

A logistics company needs to analyze GPS data from delivery trucks in real time to detect delays and reroute deliveries. The GPS data is streamed into Azure Event Hubs. They also need to combine this live data with static route information stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

Question 113mediummultiple choice
Study the full Python automation breakdown →

A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?

Question 114mediummultiple choice
Read the full Describe an analytics workload on explanation →

A manufacturing company ingests real-time sensor data from assembly line machines into Azure Event Hubs. The company needs to calculate a 5-minute rolling average of temperature readings for each machine and compare it against a static threshold value stored in a CSV file in Azure Blob Storage. If the average exceeds the threshold, an alert must be triggered. Which Azure service should be used for this real-time data processing?

Question 115mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on terabytes of CSV files stored in Azure Data Lake Storage Gen2. The queries are infrequent and unpredictable. The analyst wants to pay only for the amount of data processed by each query, and does not want to manage any compute or storage infrastructure. Which Azure service should they use?

Question 116mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company collects sales data from multiple stores. Data is ingested into Azure Data Lake Storage Gen2 as CSV files. The data team needs to run ad-hoc SQL queries on this data without moving it, and they want to pay only for the amount of data processed. They also need to integrate with Power BI for visualization. Which Azure service should they use?

Question 117hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool for a large data warehouse. The fact table contains billions of rows and is hash-distributed on ProductID. Frequent queries join this fact table with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries aggregate sales by store and product for recent months, but run slowly due to data movement during joins. Which design change will most reduce data movement and improve query performance?

Question 118mediummultiple choice
Study the full Python automation breakdown →

A data engineering team needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Apache Spark with Python code. They want a fully managed service that provides serverless Spark pools, meaning no clusters to manage and automatic scaling. Which Azure service should they use?

Question 119mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on petabytes of Parquet files stored in Azure Data Lake Storage Gen2. The queries are infrequent and highly selective. The analyst wants to pay only for the data scanned by each query and does not want to provision any compute resources. They also need to create views to simplify future queries for other analysts. Which Azure service should they use?

Question 120hardmultiple choice
Read the full NAT/PAT explanation →

A financial services company stores transaction data in Azure Data Lake Storage Gen2 as Parquet files, partitioned by date. The data volume is 5 TB per day. The analytics team runs ad-hoc SQL queries to detect fraudulent patterns. Queries are highly selective (filtering on AccountID and date range). The team also needs to create external tables and views for use in Power BI. They want to pay only for the data processed by each query and avoid provisioning any compute resources. Which Azure service should they use?

Question 121easymultiple choice
Review the full routing breakdown →

A logistics company uses IoT sensors on delivery trucks to transmit GPS location, speed, and engine diagnostics every 10 seconds. The data is ingested into Azure Event Hubs. The company needs to analyze the data in real time to identify speeding trucks and send alerts. The analysis requires joining the live sensor data with a reference table of truck details (e.g., driver name, route number) stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

Question 122mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to run ad-hoc SQL queries on petabytes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent but must return results within seconds. The analyst wants to pay only for the amount of data processed and does not want to manage any compute infrastructure. Additionally, they need to create views to simplify future reporting in Power BI. Which Azure service should they use?

Question 123easymultiple choice
Read the full Describe an analytics workload on explanation →

A company needs to run complex analytical queries that aggregate terabytes of sales data across multiple years. The queries are used for monthly business reports and are not latency-sensitive. The data is stored in Azure Data Lake Storage Gen2. The company wants a fully managed, petabyte-scale data warehouse solution that supports SQL queries and integrates with Power BI for reporting. Which Azure service should they use?

Question 124hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company has raw transaction data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files, partitioned by date. The analytics team needs to run complex SQL queries that join multiple datasets, including reference data from an Azure SQL Database, to generate risk reports. They require enterprise-grade security features such as row-level security (RLS) and column-level security. They also want to use the same service for data transformation and loading (ETL) into a curated layer. Which Azure service should they choose?

Question 125mediumdrag order
Read the full Describe an analytics workload on explanation →

Drag and drop the steps to perform a point-in-time restore of an Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 126mediumdrag order
Read the full Describe an analytics workload on explanation →

Drag and drop the steps to create an Azure Stream Analytics job in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 127mediummatching
Read the full Describe an analytics workload on explanation →

Match each Azure data tool to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Data integration and orchestration

Apache Spark-based analytics platform

Real-time stream processing

Distributed analytics (legacy)

Managed open-source analytics service

Question 128mediummatching
Read the full Describe an analytics workload on explanation →

Match each data type to its category in Azure.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Relational tables with fixed schema

JSON, XML, or key-value pairs

Blobs, files, and media

Data in tables with relationships

NoSQL data like documents or graphs

Question 129easymultiple choice
Read the full Describe an analytics workload on explanation →

A company runs a real-time dashboard in Power BI that displays sales data from Azure Synapse Analytics. The dashboard must show data with less than 5 seconds of latency. Which Azure service should be used to ingest streaming sales events into Azure Synapse Analytics?

Question 130mediummultiple choice
Read the full NAT/PAT explanation →

A data engineer needs to design a solution for a healthcare organization that must store patient records for 7 years to comply with regulatory requirements. The data will be accessed infrequently after the first year. Which Azure storage tier should be used for data older than one year?

Question 131hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics dedicated SQL pool to run large-scale analytics. The data engineering team notices that queries are slow due to excessive data movement between distributions. Which index type should be recommended to minimize data movement for fact tables that are frequently joined on a specific column?

Question 132easymultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create a report in Power BI that combines sales data from Azure SQL Database and inventory data from Azure Cosmos DB. The report should refresh daily. Which Power BI feature should be used to combine these data sources?

Question 133mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Data Lake Storage Gen2 to store IoT sensor data. The data is partitioned by date and sensor ID. A data scientist needs to efficiently query only the last 7 days of data for a specific sensor. Which strategy minimizes the amount of data scanned?

Question 134hardmultiple choice
Read the full NAT/PAT explanation →

A multinational corporation uses Azure Synapse Analytics serverless SQL pool to query data in Azure Data Lake Storage. The security team requires that access to specific columns containing personally identifiable information (PII) be restricted based on the user's role. Which feature should be implemented?

Question 135easymultiple choice
Read the full network assurance explanation →

A company wants to build a near-real-time analytics solution on Azure. IoT devices send telemetry data to Azure Event Hubs. The data must be processed and stored in Azure Cosmos DB for low-latency queries. Which Azure service should be used to process the streaming data?

Question 136mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst uses Power BI to create a report that combines data from Azure Synapse Analytics and an on-premises SQL Server database. The on-premises data must be refreshed every hour. Which component is required to connect to the on-premises data source?

Question 137hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Databricks for data engineering. The team wants to implement a medallion architecture (bronze, silver, gold) to organize data quality layers. In which layer should data be stored in a format optimized for analytics and reporting?

Question 138mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to orchestrate and automate data pipelines? (Choose two.)

Question 139hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are required to implement a real-time analytics solution using Azure Stream Analytics? (Choose three.)

Question 140easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to store semi-structured data? (Choose two.)

Question 141mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics for its data warehouse. They notice that queries against a large fact table are slow. The table is partitioned by month and uses clustered columnstore index. Which action would most likely improve query performance?

Question 142easymultiple choice
Read the full Describe an analytics workload on explanation →

Your company uses Azure Data Lake Storage Gen2 and wants to grant a data scientist read-only access to a specific container. Which built-in RBAC role should you assign?

Question 143hardmultiple choice
Read the full Describe an analytics workload on explanation →

An organization uses Azure Stream Analytics to process real-time IoT data from millions of devices. They need to ensure that the output is exactly once delivery semantics to a Power BI dataset. Which output configuration should they use?

Question 144mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics. The data volume is approximately 2 TB and the network bandwidth is limited. Which approach minimizes data transfer time?

Question 145easymultiple choice
Read the full Describe an analytics workload on explanation →

Your company wants to use Microsoft Fabric to create a unified analytics platform. Which component in Microsoft Fabric provides a lake-centric, collaborative, and governed data foundation?

Question 146hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst is using Azure Databricks to transform streaming data from Event Hubs. They need to ensure that if a failure occurs, the streaming job can resume processing from the last committed offset. Which checkpointing mechanism should they configure?

Question 147mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your organization uses Azure Purview to scan data sources. You need to set up a scan rule set that automatically classifies credit card numbers in Azure SQL Database. Which built-in classification rule should you enable?

Question 148hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company runs a critical workload in Azure Synapse Analytics. They need to ensure that if a single node fails, the data in the control node and compute nodes is not lost. Which configuration should they use?

Question 149easymultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create a real-time dashboard in Power BI that refreshes every second from an Azure Stream Analytics job. Which Power BI feature should they use?

Question 150mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to perform interactive data analytics on large datasets without managing infrastructure? (Choose two.)

Question 151hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are part of Microsoft Fabric's end-to-end analytics platform? (Choose three.)

Question 152easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO of the following are benefits of using a data lake architecture? (Choose two.)

Question 153mediummultiple choice
Read the full Describe an analytics workload on explanation →

You are deploying the above ARM template snippet for a storage account. What is the effect of setting 'isHnsEnabled' to true?

Exhibit

Refer to the exhibit.

```json
{
  "type": "Microsoft.Storage/storageAccounts",
  "apiVersion": "2023-01-01",
  "name": "[parameters('storageAccountName')]",
  "kind": "StorageV2",
  "properties": {
    "isHnsEnabled": true
  }
}
```
Question 154hardmultiple choice
Read the full Describe an analytics workload on explanation →

You run the above Kusto query in Azure Data Explorer. What does the query return?

Exhibit

Refer to the exhibit.

```kusto
StormEvents
| where EventType == "Tornado"
| summarize TornadoCount = count() by State
| order by TornadoCount desc
| take 5
```
Question 155mediummultiple choice
Read the full Describe an analytics workload on explanation →

You are reviewing the Azure Data Factory mapping data flow configuration above. Which transformation is missing to ensure that only sales from the current year are loaded?

Exhibit

Refer to the exhibit.

```json
{
  "dataflows": [
    {
      "name": "ProcessSales",
      "source": {
        "type": "AzureSqlDatabase",
        "connection": "SalesDB",
        "query": "SELECT * FROM Sales"
      },
      "sink": {
        "type": "AzureBlobStorage",
        "folderPath": "salesdata",
        "fileType": "Parquet"
      }
    }
  ]
}
```
Question 156easymultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run large-scale batch processing jobs every night. The jobs currently take 6 hours to complete, but the business requires completion within 4 hours. Which action should the company take to improve job performance?

Question 157mediummultiple choice
Read the full NAT/PAT explanation →

A healthcare organization must build an analytics solution that processes streaming patient vitals data and provides real-time dashboards. The solution must also store historical data for compliance audits. Which combination of Azure services should the organization use?

Question 158hardmultiple choice
Read the full NAT/PAT explanation →

A multinational corporation uses Azure Data Factory to orchestrate data pipelines across multiple regions. The company notices that pipeline runs in the West Europe region consistently fail due to throttling errors from the source database. The source database is an Azure SQL Database in the same region. The company needs to reduce throttling while maintaining pipeline throughput. What should the company do?

Question 159easymultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Data Lake Storage Gen2 to store raw data files. Data engineers need to transform this data using a serverless approach without managing infrastructure. Which Azure service should they use?

Question 160mediummultiple choice
Read the full Describe an analytics workload on explanation →

A retail company uses Power BI to create sales reports. The data source is an Azure SQL Database that updates every 15 minutes. The reports must reflect near real-time data without manual refresh. Which Power BI feature should the company use?

Question 161hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Stream Analytics to process IoT data from thousands of devices. The output is written to Azure SQL Database for reporting. Recently, the job latency increased significantly. The company suspects that the SQL Database is throttling writes. Which action should the company take to reduce latency?

Question 162easymultiple choice
Study the full Python automation breakdown →

A data scientist needs to perform exploratory data analysis on a large dataset stored in Azure Data Lake Storage Gen2 using Python notebooks. The solution must minimize infrastructure management. Which Azure service should the data scientist use?

Question 163mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Data Factory to copy data from an on-premises SQL Server to Azure Data Lake Storage Gen2. The transfer must be accelerated using WAN optimization. Which Data Factory feature should the company enable?

Question 164hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run both interactive queries and large batch loads. The interactive queries must have consistent performance regardless of batch load activity. Which Synapse feature should the company use?

Question 165mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to build a real-time analytics solution that ingests streaming data and provides dashboards with low latency? (Choose two.)

Question 166hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are typically part of a modern data warehouse architecture on Azure? (Choose three.)

Question 167easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to perform data transformation in a serverless manner? (Choose two.)

Question 168easymultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run large-scale analytics on sales data. They need to ensure that the workload can automatically scale based on demand without manual intervention. What feature should they configure?

Question 169mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create a real-time dashboard in Power BI that displays streaming data from Azure Event Hubs. The data must be refreshed every second. Which Power BI feature should they use?

Question 170hardmulti select
Read the full Describe an analytics workload on explanation →

Which TWO options are valid ways to ingest data into Azure Data Lake Storage Gen2?

Question 171mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your company has a data pipeline in Azure Data Factory that runs daily. Recently, the pipeline started failing with timeouts. You suspect a downstream database is slow. What should you do to monitor and alert on pipeline run duration?

Question 172hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. An administrator runs an Azure CLI command to show the status of a Synapse SQL pool. The output shown is returned. What does this output indicate about the SQL pool?

Exhibit

Refer to the exhibit.

{
  "type": "Microsoft.Synapse/workspaces/sqlPools",
  "apiVersion": "2021-06-01",
  "properties": {
    "collation": "SQL_Latin1_General_CP1_CI_AS",
    "maxSizeBytes": 263882790666240,
    "provisioningState": "Succeeded",
    "status": "Online",
    "restorePointInTime": "2023-01-15T08:00:00Z"
  }
}
Question 173mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company wants to analyze customer feedback from surveys and social media. The data includes both structured (ratings) and unstructured (comments) text. They plan to use Azure Cognitive Services for sentiment analysis. Which service should they use for text analytics?

Question 174mediummulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are part of an end-to-end analytics solution on Azure?

Question 175easymultiple choice
Read the full Describe an analytics workload on explanation →

An organization needs to run complex queries on petabytes of data stored in Azure Data Lake Storage. They want to use serverless compute to avoid managing infrastructure. Which Azure service should they use?

Question 176hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A data engineer runs the PowerShell script shown. What is the purpose of this script?

Exhibit

Refer to the exhibit.

$storageAccount = Get-AzStorageAccount -ResourceGroupName 'rg-analytics' -Name 'salesdatalake'
$ctx = $storageAccount.Context
$files = Get-AzStorageBlob -Container 'raw' -Context $ctx -Prefix 'sales/2023/'
$files | Where-Object {$_.LastModified -gt (Get-Date).AddDays(-7)}
Question 177mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Stream Analytics to process IoT data from thousands of devices. They need to store the results in a way that supports fast querying for historical analysis. Which output sink should they use?

Question 178hardmulti select
Read the full Describe an analytics workload on explanation →

Which TWO tools can be used to transform data in an Azure data pipeline?

Question 179easymultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create interactive reports from data stored in an Azure SQL Database. They want to use a self-service tool that requires minimal IT support. Which tool should they use?

Question 180mediummultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. An Azure Data Factory pipeline JSON is shown. What does this pipeline do?

Exhibit

Refer to the exhibit.

{
  "name": "sales-pipeline",
  "properties": {
    "activities": [
      {
        "name": "CopySalesData",
        "type": "Copy",
        "inputs": [{"referenceName": "SalesOnPrem", "type": "DatasetReference"}],
        "outputs": [{"referenceName": "SalesAzureSQL", "type": "DatasetReference"}],
        "typeProperties": {
          "source": {
            "type": "SqlServerSource",
            "sqlReaderQuery": "SELECT * FROM Sales WHERE OrderDate >= '2023-01-01'"
          },
          "sink": {
            "type": "SqlSink",
            "writeBatchSize": 10000
          }
        }
      }
    ]
  }
}
Question 181mediummulti select
Read the full Describe an analytics workload on explanation →

Which THREE are benefits of using a data warehouse in Azure?

Question 182hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Databricks for data engineering. They need to ensure that only authorized users can access the workspace, and they want to use single sign-on (SSO) with their existing identity provider. Which integration should they configure?

Question 183easymultiple choice
Read the full Describe an analytics workload on explanation →

A company plans to implement a near-real-time analytics solution for streaming IoT sensor data. Which Azure service should they use to ingest and process the data streams?

Question 184mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to build a data pipeline that runs daily to copy sales data from an on-premises SQL Server to Azure Synapse Analytics. Which Azure service should they use to orchestrate the pipeline?

Question 185hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics for its data warehouse. They notice that query performance is degrading over time as data grows. Which action would most likely improve performance without requiring additional compute resources?

Question 186mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to perform interactive ad-hoc analytics on large datasets using Apache Spark?

Question 187hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are essential for building a modern data warehouse architecture on Azure?

Question 188easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services are designed for big data batch processing?

Question 189easymultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A data engineer is reviewing an ARM template for a storage account. What does the property 'isHnsEnabled' set to true indicate?

Exhibit

Refer to the exhibit.

{
  "type": "Microsoft.Storage/storageAccounts",
  "apiVersion": "2023-01-01",
  "properties": {
    "isHnsEnabled": true,
    "accessTier": "Hot",
    "supportsHttpsTrafficOnly": true
  }
}
Question 190mediummultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. An analyst runs this Kusto Query Language (KQL) query in Azure Data Explorer. What is the primary purpose of this query?

Exhibit

Refer to the exhibit.

Query: 
// KQL query executed in Azure Data Explorer
StormEvents
| where State == "TEXAS"
| summarize Count = count() by EventType
| top 5 by Count
Question 191hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A developer is creating an ARM template for an Azure Synapse workspace. What is the purpose of the 'defaultDataLakeStorage' property?

Exhibit

Refer to the exhibit.

{
  "type": "Microsoft.Synapse/workspaces",
  "apiVersion": "2021-06-01-preview",
  "properties": {
    "defaultDataLakeStorage": {
      "accountUrl": "https://datalake123.dfs.core.windows.net",
      "filesystem": "synapse-workspace"
    },
    "sqlAdministratorLogin": "adminuser",
    "sqlAdministratorLoginPassword": "P@ssw0rd123!"
  }
}
Question 192easymultiple choice
Read the full NAT/PAT explanation →

A company wants to build a real-time dashboard that visualizes sales data as transactions occur. Which combination of Azure services should they use?

Question 193mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data scientist needs to train a machine learning model using data stored in Azure Data Lake Storage. They want to use a collaborative notebook environment with built-in experiment tracking. Which Azure service should they use?

Question 194hardmultiple choice
Read the full NAT/PAT explanation →

An organization has a large dataset stored in Azure Blob Storage. They need to run complex analytics using SQL queries and also want to use the same data for machine learning models. Which Azure service provides both SQL-based analytics and native integration with ML frameworks?

Question 195easymultiple choice
Read the full Describe an analytics workload on explanation →

A company wants to provide self-service analytics to business users, allowing them to create reports and dashboards from data stored in Azure Data Lake Storage. Which tool should they use?

Question 196mediummultiple choice
Read the full Describe an analytics workload on explanation →

An organization has a data lake that contains both structured and unstructured data. They need to catalog the data assets and enable data discovery for users. Which Azure service should they use?

Question 197hardmultiple choice
Read the full Describe an analytics workload on explanation →

A financial services company needs to run ad-hoc SQL queries on petabytes of data stored in Azure Data Lake Storage without provisioning a dedicated data warehouse. Which Azure service should they use?

Question 198mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company runs a SQL Server database on an Azure virtual machine. They need to offload reporting queries to a read-only copy without modifying the application. Which Azure service should they use?

Question 199easymultiple choice
Read the full Describe an analytics workload on explanation →

A data engineer needs to transform and clean data from multiple sources before loading it into Azure Synapse Analytics. Which Azure service should they use for this ETL process?

Question 200hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Data Lake Storage Gen2 as a data lake. They need to enforce row-level security for sensitive data so that sales representatives can only see rows for their assigned region. Which approach should they use?

Question 201easymultiple choice
Read the full Describe an analytics workload on explanation →

An organization wants to build a real-time dashboard that visualizes IoT sensor data as it arrives. Which Azure service should they use for processing the streaming data?

Question 202mediummultiple choice
Read the full NAT/PAT explanation →

A company is designing a data analytics solution. They need to store large volumes of raw data in its native format and support schema-on-read for data science exploration. Which storage technology should they use?

Question 203hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create an interactive report that combines sales data from Azure SQL Database and Azure Cosmos DB. The report must refresh daily. Which tool should they use?

Question 204mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics for their data warehouse. They notice that queries against the fact table are slow. The fact table is hash-distributed on OrderID. Most queries filter by CustomerID. What should they do to improve performance?

Question 205easymultiple choice
Read the full NAT/PAT explanation →

A business user wants to ask natural language questions about their data in Power BI and get answers without writing DAX. Which Power BI feature should they use?

Question 206hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company ingests streaming data from thousands of devices into Azure Event Hubs. They need to transform and aggregate the data in real time before storing it in Azure Data Lake Storage Gen2. Which Azure service should they use between Event Hubs and ADLS Gen2?

Question 207mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to build a data pipeline that moves data from on-premises SQL Server to Azure Synapse Analytics?

Question 208hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are part of a typical modern data warehouse architecture on Azure?

Question 209mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO are valid use cases for Azure Stream Analytics?

Question 210mediummultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You deploy this Azure Stream Analytics job. The job runs but no data is written to the Azure SQL Database table. What is the most likely cause?

Exhibit

Refer to the exhibit.

```json
{
  "type": "Microsoft.StreamAnalytics/streamingjobs",
  "apiVersion": "2020-03-01",
  "properties": {
    "sku": {
      "name": "StandardV2"
    },
    "inputs": [
      {
        "name": "input",
        "properties": {
          "type": "Stream",
          "datasource": {
            "type": "Microsoft.EventHub/EventHub",
            "properties": {
              "eventHubName": "telemetry-events",
              "consumerGroupName": "$Default"
            }
          }
        }
      }
    ],
    "outputs": [
      {
        "name": "output",
        "properties": {
          "datasource": {
            "type": "Microsoft.Sql/Server/Database",
            "properties": {
              "server": "sqlserver123.database.windows.net",
              "database": "sqldb",
              "table": "SensorData"
            }
          }
        }
      }
    ],
    "transformation": {
      "name": "TransformTelemetry",
      "properties": {
        "streamingUnits": 6,
        "query": "SELECT SensorId, AVG(Temperature) AS AvgTemp INTO output FROM input GROUP BY SensorId, TumblingWindow(minute, 5)"
      }
    }
  }
}
```
Question 211easymultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You have created this Azure Data Factory pipeline. When you run it, the copy activity fails with a connectivity error. What is the most likely missing component?

Exhibit

Refer to the exhibit.

```json
{
  "name": "CopyFromOnPrem",
  "properties": {
    "activities": [
      {
        "name": "CopyData",
        "type": "Copy",
        "inputs": [
          {
            "referenceName": "OnPremSqlServerDataset",
            "type": "DatasetReference"
          }
        ],
        "outputs": [
          {
            "referenceName": "AzureSqlDatabaseDataset",
            "type": "DatasetReference"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "SqlSource",
            "sqlReaderQuery": "SELECT * FROM Sales.Orders WHERE OrderDate > '2025-01-01'"
          },
          "sink": {
            "type": "SqlSink",
            "writeBatchSize": 10000
          }
        }
      }
    ],
    "integrationRuntime": {
      "referenceName": "SelfHostedIR",
      "type": "IntegrationRuntimeReference"
    }
  }
}
```
Question 212hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A database administrator runs this KQL query in Azure Monitor Log Analytics. The query returns no results. What is the most likely reason?

Exhibit

Refer to the exhibit.

```kusto
let startTime = datetime(2025-01-01T00:00:00Z);
let endTime = datetime(2025-01-02T00:00:00Z);
AzureDiagnostics
| where ResourceType == "AZURESQLDB"
| where TimeGenerated between (startTime .. endTime)
| where OperationName == "QueryThrottled"
| summarize Count = count() by DatabaseName, bin(TimeGenerated, 1h)
| render timechart
```
Question 213mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your company runs a sales analytics dashboard on Power BI that refreshes every hour from Azure Synapse Analytics. During peak hours, the dashboard refresh fails with a 'timeout' error. Which action should you take FIRST to resolve the issue?

Question 214hardmultiple choice
Read the full Describe an analytics workload on explanation →

A retail company uses Azure Data Lake Storage Gen2 as a data lake and Azure Databricks for ETL. They notice that a Spark job reading Parquet files from the data lake fails with an 'Access Denied' error when the job runs as a service principal. The service principal has Storage Blob Data Contributor role on the storage account. What is the most likely cause?

Question 215easymultiple choice
Read the full Describe an analytics workload on explanation →

Your company needs to build an analytics solution that can handle both batch and streaming data from IoT devices. The solution must allow complex event processing and real-time dashboards. Which Azure service should you use as the primary data ingestion and processing layer?

Question 216mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your company has a Power BI dashboard that uses a data model with a single large fact table and several dimension tables. The dashboard loads slowly when users filter by multiple dimensions. Which design change would MOST improve performance?

Question 217hardmultiple choice
Read the full Describe an analytics workload on explanation →

Your data engineering team is designing a data pipeline that ingests data from multiple sources into Azure Data Lake Storage Gen2. The data must be cataloged in Azure Purview for discoverability. Which approach ensures that the data lineage is automatically captured?

Question 218easymultiple choice
Read the full Describe an analytics workload on explanation →

Your company uses Azure Synapse Analytics to run analytical queries on large datasets. You need to ensure that queries against a frequently accessed fact table perform well without impacting other workloads. Which feature should you use?

Question 219mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your company has a Power BI report that uses DirectQuery to Azure SQL Database. Users report that the report is slow when multiple users access it simultaneously. The database is underprovisioned. Which action should you take to improve performance without changing the report design?

Question 220easymultiple choice
Read the full Describe an analytics workload on explanation →

Your company is migrating an on-premises SQL Server data warehouse to Azure. The solution must support both historical analytics and real-time reporting. Which Azure service should you recommend as the primary data store?

Question 221hardmultiple choice
Read the full Describe an analytics workload on explanation →

Your company uses Azure Databricks to process streaming data from Event Hubs. The data is transformed and written to Azure Data Lake Storage Gen2 as Delta tables. You notice that some records are duplicated in the Delta tables. Which configuration change should you make to prevent duplicates?

Question 222mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to perform data transformation in an analytics pipeline? (Choose two.)

Question 223mediummulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are part of a typical modern data warehouse architecture on Azure? (Choose three.)

Question 224easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to store semi-structured data like JSON or Parquet files for analytics? (Choose two.)

Question 225mediummultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You execute the above T-SQL statements in Azure Synapse Analytics. What is the purpose of this code?

Exhibit

Refer to the exhibit.

```sql
CREATE EXTERNAL DATA SOURCE MyDataSource WITH (
    LOCATION = 'https://mystorageaccount.dfs.core.windows.net/container',
    CREDENTIAL = MyCredential
);

CREATE EXTERNAL FILE FORMAT MyFileFormat WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

CREATE EXTERNAL TABLE dbo.Sales (
    SaleID INT,
    ProductName VARCHAR(100),
    SaleAmount DECIMAL(10,2),
    SaleDate DATE
)
WITH (
    LOCATION = '/sales/',
    DATA_SOURCE = MyDataSource,
    FILE_FORMAT = MyFileFormat
);
```
Question 226hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You have an Azure Data Factory pipeline definition as shown. The pipeline fails with a 'Source not found' error. The BlobInputDataset points to a container that exists. What is the most likely cause?

Exhibit

Refer to the exhibit.

```json
{
  "name": "CopyDataPipeline",
  "properties": {
    "activities": [
      {
        "name": "CopyFromBlobToSQL",
        "type": "Copy",
        "inputs": [
          {
            "referenceName": "BlobInputDataset",
            "type": "DatasetReference"
          }
        ],
        "outputs": [
          {
            "referenceName": "SQLOutputDataset",
            "type": "DatasetReference"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "DelimitedTextSource",
            "storeSettings": {
              "type": "AzureBlobStorageReadSettings",
              "recursive": true
            }
          },
          "sink": {
            "type": "SqlSink",
            "writeBatchSize": 10000
          }
        }
      }
    ]
  }
}
Question 227easymultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You have a Power BI measure defined as shown. What does this measure return?

Exhibit

Refer to the exhibit.

Power BI DAX expression:
```
Total Sales = 
CALCULATE(
    SUM(Sales[Amount]),
    Sales[Channel] = "Online"
)
```
Question 228easymultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run large-scale data transformations. They need to optimize costs for predictable workloads that run every night. Which Azure feature should they configure?

Question 229mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create a real-time dashboard in Power BI that displays sales data from an Azure SQL Database. The dashboard must update every 10 minutes without manual refresh. Which Power BI feature should they use?

Question 230hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A team is deploying an Azure Storage container using an ARM template. The template sets publicAccess to 'None'. However, after deployment, users report they cannot access data even with a valid SAS token. What is the most likely cause?

Exhibit

{
  "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
  "apiVersion": "2023-01-01",
  "name": "[concat(parameters('storageAccountName'), '/default/', parameters('containerName'))]",
  "dependsOn": [
    "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
  ],
  "properties": {
    "publicAccess": "None"
  }
}
Question 231easymultiple choice
Read the full Describe an analytics workload on explanation →

A company wants to build a data lake on Azure for storing structured, semi-structured, and unstructured data. The solution must support fast queries on structured data without moving data to a separate store. Which Azure service should they use?

Question 232mediummultiple choice
Read the full Describe an analytics workload on explanation →

An organization runs a mission-critical analytics workload on Azure Synapse Analytics. They need to ensure high availability and automatic failover in case of a regional outage. Which configuration should they implement?

Question 233hardmultiple choice
Read the full Describe an analytics workload on explanation →

A data engineering team uses Azure Data Factory to orchestrate an ETL pipeline that loads data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline fails intermittently with timeout errors during the copy activity. The network is stable. What should they do first to resolve the issue?

Question 234easymultiple choice
Read the full Describe an analytics workload on explanation →

A company needs to analyze streaming data from IoT devices in real time. They want to identify anomalies and trigger alerts. Which Azure service should they use as the core processing engine?

Question 235mediummultiple choice
Read the full Describe an analytics workload on explanation →

A data analyst needs to create a Power BI report that combines sales data from Azure SQL Database and marketing data from a CSV file stored in Azure Blob Storage. The report should refresh automatically. What is the recommended approach?

Question 236hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. An administrator is configuring aggregations in Power BI Premium to improve performance on a large dataset. The aggregation is defined on the Sales table with SUM(Amount) grouped by ProductCategory, Region, and Date at the monthly level. However, some reports that query daily data are still slow. What is the most likely reason?

Exhibit

{
  "version": "1.0",
  "aggregations": [
    {
      "table": "Sales",
      "measure": "SUM(Amount)",
      "dimensions": ["ProductCategory", "Region", "Date"],
      "aggregationLevel": "Monthly"
    }
  ]
}
Question 237mediummulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services can be used to perform large-scale data transformation and processing in a serverless manner?

Question 238hardmulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are essential for building a real-time analytics solution on Azure?

Question 239easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO are benefits of using Azure Synapse Analytics for a data warehouse workload?

Question 240hardmultiple choice
Read the full Describe an analytics workload on explanation →

You are the data engineer for a large retail company. The company has an existing on-premises SQL Server database with 10 years of transactional data. They want to move this data to Azure to enable advanced analytics using Azure Synapse Analytics. The data includes customer orders, product details, and inventory. The solution must minimize data movement and support both batch and real-time analytics. The company also wants to use Power BI for reporting. They have a limited budget and prefer a serverless option for compute. You are evaluating the following approaches: A) Use Azure Data Factory to copy all data to Azure Data Lake Storage Gen2, then use Azure Synapse Serverless SQL pool to query the data, and finally connect Power BI to the serverless SQL endpoint. B) Use Azure Database Migration Service to migrate the SQL Server database to Azure SQL Database, then use Azure Synapse Analytics with a dedicated SQL pool to perform analytics, and connect Power BI to the dedicated pool. C) Use Azure Data Factory to copy all data to Azure Blob Storage, then use Azure Stream Analytics to perform real-time analytics, and connect Power BI directly to Stream Analytics output. D) Use Azure Data Factory to copy historical data to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool for batch analytics, and use Azure Event Hubs and Stream Analytics for real-time data, with Power BI connecting to both serverless SQL and Stream Analytics. Which approach best meets the requirements?

Question 241mediummultiple choice
Read the full Describe an analytics workload on explanation →

Your company is developing a new analytics solution to track customer sentiment from social media feeds. The data arrives as a continuous stream of JSON messages. The solution must process the data in near real-time, enrich it with customer profile data stored in Azure Cosmos DB, and then store the results in a data lake for historical analysis. The team wants to use a low-code approach for the data processing logic. You are considering the following architectures: A) Use Azure Event Hubs to ingest the stream, Azure Stream Analytics to process and enrich the data using Cosmos DB as a reference data source, and output to Azure Data Lake Storage Gen2. B) Use Azure IoT Hub to ingest the stream, Azure Databricks to process the data, and write to Azure Blob Storage. C) Use Azure Event Hubs to ingest the stream, Azure Functions to process each message, query Cosmos DB for enrichment, and write to Azure Data Lake Storage Gen2. D) Use Azure Event Hubs to ingest the stream, Azure Data Factory to execute a mapping data flow for enrichment, and write to Azure Data Lake Storage Gen2. Which architecture best meets the requirements of near real-time processing, enrichment, and low-code?

Question 242easymultiple choice
Read the full Describe an analytics workload on explanation →

A small business wants to start using Azure for analytics. They have a few CSV files stored on-premises that they want to analyze. They have no budget for complex infrastructure and prefer a fully managed, serverless solution. They need to create interactive visualizations and share them with their team. The data does not change frequently, so they are okay with daily refreshes. Which of the following options should they choose? A) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Databricks to create a data processing pipeline, and then use Power BI to visualize the results. B) Upload the CSV files to Azure Blob Storage, use Azure Data Factory to load the data into Azure SQL Database, and then use Power BI to connect and visualize. C) Upload the CSV files to OneDrive for Business, use Power BI Desktop to import the data, and publish to Power BI Service with scheduled refresh. D) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool to query the data, and then use Power BI to connect. Which option is the simplest and most cost-effective?

Question 243mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company runs a sales analytics workload on Azure Synapse Analytics. They notice that queries against the fact table are slow. The fact table is hash-distributed on the SalesDate column. Which design change would most likely improve query performance?

Question 244hardmultiple choice
Read the full Describe an analytics workload on explanation →

A retail company uses Azure Data Lake Storage Gen2 to store raw clickstream data. They need to process this data using Azure Databricks to create hourly aggregated reports. The data pipeline must minimize costs while meeting a five-minute processing SLA. What is the most cost-effective compute option?

Question 245easymultiple choice
Read the full NAT/PAT explanation →

A company wants to build a real-time analytics dashboard for IoT sensor data. Which combination of Azure services should they use?

Question 246mediummultiple choice
Read the full Describe an analytics workload on explanation →

A company uses Azure Synapse Analytics to run a data warehouse. They need to load 500 GB of historical data from Azure Blob Storage into a staging table. They want the fastest load performance with minimal administrative overhead. Which method should they use?

Question 247hardmultiple choice
Read the full Describe an analytics workload on explanation →

A company has a Power BI dashboard that refreshes daily from an Azure SQL Database. During refresh, the database experiences high CPU usage that impacts transactional applications. They need to minimize impact while keeping the dashboard up-to-date. What should they do?

Question 248easymulti select
Read the full Describe an analytics workload on explanation →

Which TWO Azure services are primarily used for batch processing of large data volumes?

Question 249mediummulti select
Read the full Describe an analytics workload on explanation →

Which THREE components are required to implement a modern data warehouse architecture (medallion architecture) on Azure?

Question 250hardmulti select
Read the full Describe an analytics workload on explanation →

Which TWO options are valid ways to secure data at rest in Azure Synapse Analytics?

Question 251mediummultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. You are reviewing an ARM template that deploys a SQL database in Azure Synapse. The template sets the storageAccountType to GRS. What is a valid concern regarding cost and performance?

Exhibit

{
  "type": "Microsoft.Synapse/workspaces/sqlDatabases",
  "apiVersion": "2021-06-01",
  "properties": {
    "collation": "SQL_Latin1_General_CP1_CI_AS",
    "maxSizeBytes": 268435456000,
    "storageAccountType": "GRS"
  }
}
Question 252hardmultiple choice
Read the full Describe an analytics workload on explanation →

Refer to the exhibit. A data engineer wants to ensure that all Azure Storage accounts used for analytics use customer-managed keys. They apply this Azure Policy. What is the outcome?

Exhibit

{
  "policyRule": {
    "if": {
      "allOf": [
        {
          "field": "type",
          "equals": "Microsoft.Storage/storageAccounts"
        },
        {
          "field": "Microsoft.Storage/storageAccounts/kind",
          "equals": "StorageV2"
        },
        {
          "field": "Microsoft.Storage/storageAccounts/encryption.keySource",
          "notEquals": "Microsoft.Keyvault"
        }
      ]
    },
    "then": {
      "effect": "deny"
    }
  }
}
Question 253mediummultiple choice
Read the full Describe an analytics workload on explanation →

You are a data engineer for a large e-commerce company. The company uses Azure Data Lake Storage Gen2 to store customer transaction data. They also use Azure Databricks for data transformation and Azure Synapse Serverless SQL pool for ad-hoc queries. Recently, the data lake has grown to 10 TB, and query performance in Synapse Serverless has degraded significantly. Users complain that queries that used to take seconds now take minutes. You need to improve query performance without moving data to a dedicated SQL pool. The data is stored in Parquet format, partitioned by date. You notice that the queries often filter on CustomerID and Date. Current queries scan all partitions even when only a few days are needed. What is the most effective solution to improve performance?

Question 254hardmultiple choice
Read the full NAT/PAT explanation →

You are a data architect for a healthcare organization. The organization needs to build a real-time analytics solution to monitor patient vital signs from IoT devices. The data arrives at a rate of 10,000 events per second. Each event contains patient ID, timestamp, heart rate, blood pressure, and oxygen saturation. The solution must alert clinicians within 10 seconds when a patient's vital signs exceed predefined thresholds. Additionally, the solution must store the raw data for historical analysis and compliance. You plan to use Azure Event Hubs for ingestion. Which combination of services should you use to meet the requirements? Consider: processing low latency alerts, storing raw data in cost-effective storage, and enabling historical analytics. You also need to ensure that the solution can scale to handle future growth.

Question 255easymultiple choice
Read the full Describe an analytics workload on explanation →

You are a business analyst at a manufacturing company. The company uses Azure SQL Database to store production data. You need to create a Power BI report that shows real-time machine efficiency. The report must refresh every 5 minutes to show current metrics. You have been granted read-only access to the database. The database is under heavy load from transactional applications, and you want to minimize additional impact. Which approach should you take to create the report?

Question 256mediummultiple choice
Read the full Describe an analytics workload on explanation →

You are a data engineer for a financial services company. The company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. They have a fact table named Transactions that contains 2 billion rows. The table is hash-distributed on the AccountID column. Users run reports that aggregate transaction amounts by date and account type. The reports are slow. Upon investigation, you find that the distribution is highly skewed because a few accounts have millions of transactions. You need to improve query performance without redesigning the entire schema. Which action should you take?

Question 257hardmultiple choice
Read the full NAT/PAT explanation →

You are a data architect for a logistics company. The company uses Azure Data Lake Storage Gen2 to store shipment tracking data. The data is ingested from IoT devices on trucks. Each record contains truck ID, timestamp, GPS coordinates, speed, and fuel level. The volume is 5 TB per day. The company wants to build a near-real-time dashboard to monitor truck locations and speeds. They also need to run daily batch analytics to compute fuel efficiency trends. You need to design a solution that minimizes latency for the dashboard and maximizes cost efficiency for batch processing. You plan to use Azure Event Hubs for ingestion. Which approach should you take?

Question 258hardmulti select
Read the full Describe an analytics workload on explanation →

Your company is designing a big data analytics solution on Azure. The solution must ingest streaming data from IoT devices, store the data in its raw format, and then use a distributed processing engine to transform the data before loading it into a serving layer for reporting. Which TWO Azure services should you include in the design?

Question 259mediummulti select
Read the full Describe an analytics workload on explanation →

A data engineering team is building a data pipeline to run daily batch loads from an on-premises SQL Server to Azure Synapse Analytics. The pipeline must include data transformation using a visual interface with no coding, and must support schema mapping and data validation. Which THREE Azure services should be used together?

Question 260hardmultiple choice
Read the full Describe an analytics workload on explanation →

Your company operates a retail analytics platform. Data from point-of-sale systems is ingested in real time into Azure Event Hubs. The data is then consumed by an Azure Stream Analytics job that aggregates sales by store and product every minute, writing results to Azure SQL Database. The business now requires a historical trend analysis capability that can query the last three years of sales data with sub-second response times, but the SQL Database is already experiencing performance issues due to high write volume. You need to redesign the serving layer to support both real-time dashboards (seconds latency) and historical analytics (sub-second queries on years of data) without impacting write performance. What should you do?

Question 261easymultiple choice
Study the full Python automation breakdown →

Your organization has a data lake on Azure Data Lake Storage Gen2 containing petabytes of raw clickstream data. Data scientists need to run exploratory analysis using Python and Spark, but they are not experienced with cluster management or infrastructure. The IT team wants to minimize administrative overhead while providing a collaborative notebook environment. Additionally, the solution must integrate with Microsoft Purview for data cataloging and lineage. Which Azure service should you recommend?

Question 262mediummultiple choice
Study the full Python automation breakdown →

A financial services company is building a real-time fraud detection system. Transactions are streamed from multiple sources into Azure Event Hubs. The system must run a trained machine learning model (scored in near real-time) to flag suspicious transactions. The model is a Python pickle file that needs to be deployed as a web service with low latency (under 100 ms per prediction). The data engineering team wants to use a serverless compute option to run the scoring logic, and the solution must integrate with Azure Stream Analytics for alerting. Which Azure service should you use to deploy the model?

Practice tests

Scored 10-question sessions with instant feedback and explanations.

DP-900 Practice Test 1 — 10 Questions→DP-900 Practice Test 2 — 10 Questions→DP-900 Practice Test 3 — 10 Questions→DP-900 Practice Test 4 — 10 Questions→DP-900 Practice Test 5 — 10 Questions→DP-900 Practice Exam 1 — 20 Questions→DP-900 Practice Exam 2 — 20 Questions→DP-900 Practice Exam 3 — 20 Questions→DP-900 Practice Exam 4 — 20 Questions→Free DP-900 Practice Test 1 — 30 Questions→Free DP-900 Practice Test 2 — 30 Questions→Free DP-900 Practice Test 3 — 30 Questions→DP-900 Practice Questions 1 — 50 Questions→DP-900 Practice Questions 2 — 50 Questions→DP-900 Exam Simulation 1 — 100 Questions→

Practice by domain

Each domain maps to a weighted exam section. Focus on the domain where you are weakest.

Describe core data conceptsDescribe an analytics workload on AzureIdentify considerations for relational data on AzureDescribe considerations for working with non-relational data on Azure

Practice by scenario

Filter questions by type — troubleshooting, exhibit, drag-and-drop, PBQ, ACLs, OSPF, and more.

Browse scenarios→

Continue studying

All Describe an analytics workload on Azure setsAll Describe an analytics workload on Azure questionsDP-900 Practice Hub