CCNA Describe an analytics workload on Azure Questions — Page 3 of 4

151

MCQhard

A financial services company needs to run ad-hoc SQL queries on petabytes of data stored in Azure Data Lake Storage without provisioning a dedicated data warehouse. Which Azure service should they use?

A.Azure Synapse Analytics serverless SQL pool

B.Azure Analysis Services

C.Azure SQL Database

D.Azure Data Lake Storage

AnswerA

Serverless querying of data lake files.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it allows you to run ad-hoc SQL queries directly against data in Azure Data Lake Storage without provisioning any dedicated compute resources. It uses a pay-per-query model, automatically scaling compute to handle petabytes of data, making it ideal for intermittent, exploratory workloads.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage (a storage service) with a query engine, or assume that a provisioned data warehouse like Azure SQL Database is required for any SQL workload, missing the serverless, on-demand nature of Synapse serverless SQL pool.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine for semantic models and pre-aggregated data, not designed for direct ad-hoc SQL queries on raw petabyte-scale data in Data Lake Storage. Option C is wrong because Azure SQL Database is a provisioned, transactional relational database with fixed storage limits, unsuitable for petabyte-scale data lake queries without prior data loading. Option D is wrong because Azure Data Lake Storage is a storage service, not a query engine; it provides the data layer but cannot execute SQL queries itself.

Practice this question →

152

MCQmedium

A data engineering team needs to build a real-time dashboard showing sales totals by region. Sales transactions are streamed from point-of-sale systems into Azure Event Hubs. The team wants to aggregate the data in near real-time (e.g., every minute) and store the results in Azure SQL Database for visualization in Power BI. Which Azure service should they use for the aggregation step?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Pipelines

D.Azure Logic Apps

AnswerA

Correct. Stream Analytics processes streaming data with SQL-like queries and can aggregate and output to SQL Database in real time.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, allowing you to define a query that aggregates sales data from Event Hubs over a one-minute tumbling window and output the results directly to Azure SQL Database. This meets the requirement for near real-time aggregation without needing to write custom code or manage infrastructure.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Synapse Pipelines (which are batch-oriented) with real-time processing, overlooking that Stream Analytics is the only service among the options purpose-built for continuous, low-latency stream aggregation.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data integration service for batch-oriented data movement and orchestration, not designed for real-time stream processing or sub-minute aggregations. Option C (Azure Synapse Pipelines) is wrong because it is essentially the same as Azure Data Factory within Synapse Analytics, focused on batch data integration and orchestration, lacking native real-time stream processing capabilities. Option D (Azure Logic Apps) is wrong because it is a workflow automation service for integrating applications and services using connectors, not built for high-throughput, low-latency stream aggregation or windowed computations on event streams.

Practice this question →

153

MCQmedium

A manufacturing company collects real-time temperature data from thousands of IoT sensors. They need to build an analytics solution that processes the streaming data, computes the average temperature per device every minute, and outputs the results to a Power BI dashboard for near real-time visualization. Which Azure service should they use for the real-time stream processing?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure SQL Database

AnswerA

Correct. Azure Stream Analytics provides a serverless, real-time stream processing engine that can compute aggregations using SQL-like queries and output to Power BI.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed specifically for scenarios like this: ingesting high-velocity data from IoT sensors, performing time-windowed aggregations (e.g., average temperature per device every minute), and outputting results directly to Power BI for near real-time dashboards. It natively supports SQL-like query language for defining windowed computations and has built-in connectors for both IoT Hub/Event Hubs (input) and Power BI (output), making it the most efficient and purpose-built service for this streaming analytics workload.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a batch ETL tool) with a real-time processing service, or assume that Azure Databricks is always the best choice for streaming because of its Spark foundation, overlooking the simpler, fully managed, and cost-effective alternative of Azure Stream Analytics for straightforward windowed aggregations.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data orchestration service designed for batch data movement and transformation, not for real-time stream processing; it cannot compute sliding-window averages on streaming data in sub-minute latency. Option C (Azure Databricks) is wrong because while it can process streaming data via Structured Streaming, it is a general-purpose analytics platform that requires more complex setup, cluster management, and coding (Scala/Python/SQL) compared to the simpler, declarative SQL-based approach of Stream Analytics; it is overkill for a straightforward windowed aggregation and not the simplest or most cost-effective choice for this specific requirement. Option D (Azure SQL Database) is wrong because it is a relational database for storing and querying structured data, not a stream processing engine; it cannot ingest real-time streaming data or perform continuous time-windowed aggregations without additional services like Stream Analytics or a custom application layer.

Practice this question →

154

Multi-Selectmedium

Which THREE components are required to implement a modern data warehouse architecture (medallion architecture) on Azure?

Select 3 answers

A.Azure Data Lake Storage Gen2

B.Azure Analysis Services

C.Azure Synapse Analytics

D.Azure Databricks

E.Azure DevOps

AnswersA, C, D

Serves as the storage foundation for raw and transformed data.

Why this answer

Azure Data Lake Storage Gen2 is required because it provides the scalable, hierarchical storage foundation for the medallion architecture's bronze, silver, and gold layers. It combines blob storage with a hierarchical namespace, enabling efficient data ingestion and transformation at petabyte scale.

Exam trap

The trap here is that candidates often confuse optional analytics services (like Azure Analysis Services or Azure DevOps) with the mandatory storage and compute components required for the medallion architecture.

Practice this question →

155

Multi-Selectmedium

Which TWO Azure services can be used to perform large-scale data transformation and processing in a serverless manner?

Select 2 answers

A.Azure Analysis Services

B.Azure Synapse Serverless SQL pool

C.Azure Data Factory

D.Azure Databricks

E.Azure SQL Database

AnswersB, C

Serverless SQL pool is serverless for querying data.

Why this answer

Azure Synapse Serverless SQL pool (Option B) is correct because it allows you to run T-SQL queries over data stored in Azure Data Lake or Blob Storage without provisioning any dedicated compute resources, paying only for the data processed. Azure Data Factory (Option C) is correct because it provides a serverless orchestration and data integration service that can execute data transformation activities (like Mapping Data Flows) at scale without managing underlying infrastructure.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'fully managed' or 'PaaS', leading them to select Azure Databricks or Azure SQL Database, which still require explicit compute provisioning or cluster management, unlike the truly serverless models of Synapse Serverless SQL pool and Data Factory.

Practice this question →

156

MCQeasy

Refer to the exhibit. You have a Power BI measure defined as shown. What does this measure return?

A.The count of online sales transactions.

B.The sum of Amount for each product sold online.

C.The total sales amount for all channels.

D.The total sales amount for online channel only.

AnswerD

CALCULATE with filter on Channel='Online'.

Why this answer

Option A is correct. The CALCULATE function evaluates the SUM of Amount for rows where Channel is "Online". Option B is wrong because it sums all sales.

Option C is wrong because it sums Amount per product, not total. Option D is wrong because it counts rows.

Practice this question →

157

MCQhard

A company stores terabytes of historical sales data as Parquet files in Azure Data Lake Storage Gen2. Business analysts need to run ad-hoc SQL queries that involve complex joins and aggregations over this data. They want to avoid provisioning a dedicated cluster or moving data into a separate database. The queries must be executed using standard T-SQL syntax. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure Synapse Analytics serverless SQL pool

C.Azure Databricks

D.Azure HDInsight

AnswerB

The serverless SQL pool can query data in the data lake directly using T-SQL, scales on demand, and charges per query, fitting the requirement of no provisioning and no data movement.

Why this answer

Azure Synapse Analytics serverless SQL pool (B) is the correct choice because it allows you to query Parquet files directly in Azure Data Lake Storage Gen2 using standard T-SQL syntax without provisioning any dedicated cluster or moving data. It automatically scales compute resources to handle complex joins and aggregations on terabytes of data, charging only for the data processed. This matches the requirement for ad-hoc, serverless querying with familiar T-SQL.

Exam trap

The trap here is that candidates often confuse 'serverless SQL pool' with 'dedicated SQL pool' (Option A), assuming both require provisioning, or they mistakenly think Databricks (Option C) supports standard T-SQL, when it actually uses Spark SQL or Python.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning a dedicated cluster with fixed compute resources, incurring ongoing costs even when idle, and it typically involves moving data into the pool's managed tables, which violates the requirement to avoid provisioning a dedicated cluster or moving data. Option C is wrong because Azure Databricks uses Spark SQL or Python notebooks, not standard T-SQL syntax, and requires a running cluster (provisioned compute) to execute queries, which contradicts the serverless and T-SQL requirements. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that requires provisioning and managing a cluster, uses HiveQL or Spark SQL (not standard T-SQL), and is designed for batch processing rather than ad-hoc interactive SQL queries.

Practice this question →

158

MCQmedium

A retail company needs to run complex SQL queries on petabytes of historical sales data stored in Parquet files in Azure Data Lake Storage Gen2. They want a solution that provides fast query performance without managing infrastructure, and they prefer a pay-per-query pricing model. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Synapse Serverless SQL pool

D.Azure HDInsight with Hive

AnswerC

Serverless SQL pool is ideal for querying data lakes with a pay-per-query model and no infrastructure management.

Why this answer

Azure Synapse Serverless SQL pool is correct because it allows querying petabytes of Parquet files in Azure Data Lake Storage Gen2 using T-SQL without provisioning any infrastructure, and it charges per terabyte of data processed (pay-per-query). This matches the requirements for fast query performance on historical sales data with a serverless, consumption-based pricing model.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics dedicated SQL pool (provisioned, always-on) with the serverless SQL pool (pay-per-query), or assume that Azure SQL Database can handle big data analytics on Parquet files, when it is designed for OLTP workloads and lacks native support for querying external data lakes without additional services like PolyBase.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and managing dedicated compute resources (e.g., DWUs), incurring ongoing costs regardless of query usage, and does not offer a pay-per-query model. Option B is wrong because Azure SQL Database is a relational database service designed for transactional workloads, not for querying petabytes of Parquet files in Data Lake Storage Gen2, and it lacks native support for serverless querying of external data formats. Option D is wrong because Azure HDInsight with Hive requires managing a Hadoop cluster (provisioning VMs, scaling, patching), does not offer a pay-per-query pricing model, and incurs costs for running the cluster even when idle.

Practice this question →

159

MCQeasy

A company runs a real-time dashboard in Power BI that displays sales data from Azure Synapse Analytics. The dashboard must show data with less than 5 seconds of latency. Which Azure service should be used to ingest streaming sales events into Azure Synapse Analytics?

A.Azure SQL Database

B.Azure Stream Analytics

C.Azure Data Factory

D.Azure Databricks

AnswerB

Azure Stream Analytics is a real-time analytics service that can process streaming data with sub-second latency and output directly to Azure Synapse Analytics.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, capable of ingesting high-velocity streaming sales events and outputting them to Azure Synapse Analytics with sub-second latency. This meets the requirement of less than 5 seconds of latency for the Power BI dashboard.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a batch ETL tool) with a real-time streaming service, or they assume Azure SQL Database can handle streaming ingestion, but neither supports the required sub-5-second latency for continuous data flow into Synapse Analytics.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database for OLTP workloads, not a stream ingestion service, and it cannot natively process streaming data with low latency into Synapse. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service designed for batch data movement and transformation, not real-time streaming with sub-5-second latency. Option D is wrong because Azure Databricks is an analytics platform for big data processing and machine learning, but it is not optimized for low-latency stream ingestion into Synapse; it typically requires additional streaming tools like Structured Streaming and adds overhead.

Practice this question →

160

MCQeasy

A company plans to implement a near-real-time analytics solution for streaming IoT sensor data. Which Azure service should they use to ingest and process the data streams?

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Data Lake Storage Gen2

D.Azure Stream Analytics

AnswerD

Designed for real-time stream processing and analytics.

Why this answer

Azure Stream Analytics is a real-time event processing engine designed to ingest, process, and analyze high-velocity streaming data from sources like IoT sensors. It supports SQL-based queries to transform and route data streams to outputs such as Power BI or Azure Synapse, making it ideal for near-real-time analytics.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Data Factory or storage services like Data Lake Storage Gen2 with real-time stream processing, overlooking that Stream Analytics is the dedicated service for near-real-time data stream ingestion and analysis.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data integration service for batch data movement and orchestration, not designed for real-time stream ingestion. Option B is wrong because Azure Synapse Analytics is a unified analytics platform for large-scale data warehousing and big data analytics, but it relies on separate streaming services like Stream Analytics for real-time ingestion. Option C is wrong because Azure Data Lake Storage Gen2 is a scalable data lake storage solution for storing structured and unstructured data, not a stream processing engine.

Practice this question →

161

MCQhard

A company is designing an enterprise analytics solution. They store raw data in its original format in a scalable repository, apply schema and transformations at read time, and also maintain a curated layer that enforces ACID transactions for data reliability. This architecture combines the flexibility of a data lake with the reliability of a data warehouse. Which term best describes this modern data architecture?

A.Data lakehouse

B.Data mart

C.Operational database

D.Data pipeline

AnswerA

Correct. The data lakehouse architecture combines a data lake's flexibility (schema-on-read) with a data warehouse's ACID transactions and performance, often implemented with Delta Lake.

Why this answer

The data lakehouse architecture combines the flexibility of a data lake (storing raw data in its original format in a scalable repository) with the reliability of a data warehouse (enforcing ACID transactions in a curated layer). This allows schema-on-read transformations while maintaining data integrity, making it the correct term for the described design.

Exam trap

The trap here is that candidates may confuse a data lakehouse with a data lake or data warehouse, missing the key combination of raw storage, schema-on-read, and ACID transactions that defines this modern architecture.

How to eliminate wrong answers

Option B is wrong because a data mart is a subset of a data warehouse focused on a specific business domain, not an architecture that combines a data lake with ACID transactions. Option C is wrong because an operational database is designed for real-time transaction processing (OLTP) and does not typically store raw data in a scalable repository or apply schema-on-read transformations. Option D is wrong because a data pipeline is a process for moving and transforming data between systems, not an architecture that combines storage and ACID reliability.

Practice this question →

162

Multi-Selecthard

Which TWO options are valid ways to secure data at rest in Azure Synapse Analytics?

Select 2 answers

A.Customer-managed keys (CMK) with TDE

B.Transparent Data Encryption (TDE)

C.Always Encrypted

D.Column-level security

E.Dynamic Data Masking

AnswersA, B

Provides additional control over encryption keys.

Why this answer

Customer-managed keys (CMK) with TDE is a valid method to secure data at rest in Azure Synapse Analytics because it allows you to bring your own key (BYOK) to encrypt the database encryption key (DEK), which is stored in Azure Key Vault. This provides an additional layer of control and separation of duties, ensuring that only authorized users can access the encryption keys and thus the underlying data.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking or Column-level security with data-at-rest encryption, but these are access control or obfuscation features, not encryption mechanisms that protect data stored on disk.

Practice this question →

163

MCQeasy

Your organization has a data lake on Azure Data Lake Storage Gen2 containing petabytes of raw clickstream data. Data scientists need to run exploratory analysis using Python and Spark, but they are not experienced with cluster management or infrastructure. The IT team wants to minimize administrative overhead while providing a collaborative notebook environment. Additionally, the solution must integrate with Microsoft Purview for data cataloging and lineage. Which Azure service should you recommend?

A.Azure Databricks

B.Azure Data Science Virtual Machine

C.Azure Synapse Analytics (Synapse Studio)

D.Azure HDInsight (Spark cluster)

AnswerA

Databricks provides a collaborative notebook environment, automated cluster management, and integrates with Microsoft Purview.

Why this answer

Azure Databricks is the correct choice because it provides a fully managed, collaborative notebook environment optimized for Apache Spark, allowing data scientists to run Python and Spark-based exploratory analysis without managing clusters. It integrates natively with Azure Data Lake Storage Gen2 for accessing petabytes of clickstream data and supports Microsoft Purview for automated data cataloging and lineage tracking, minimizing administrative overhead.

Exam trap

The trap here is that candidates may choose Azure Synapse Analytics because it also supports Spark and notebooks, but they overlook that Databricks is purpose-built for collaborative data science with minimal infrastructure management, while Synapse is optimized for data warehousing and ETL workloads.

How to eliminate wrong answers

Option B (Azure Data Science Virtual Machine) is wrong because it is a pre-configured VM that requires manual cluster management and scaling, lacking the serverless Spark capabilities and collaborative notebook environment needed for petabyte-scale analysis. Option C (Azure Synapse Analytics) is wrong because while it offers Spark pools and notebooks, it is primarily designed for enterprise data warehousing and ETL, and its collaborative notebook experience is less mature than Databricks, with higher administrative overhead for cluster management. Option D (Azure HDInsight) is wrong because it requires manual cluster provisioning, configuration, and scaling, and does not provide a built-in collaborative notebook environment, increasing administrative burden for data scientists unfamiliar with infrastructure.

Practice this question →

164

MCQmedium

A retail company needs to analyze sales transactions as they occur to detect fraud patterns and immediately block suspicious orders. They also need to run daily batch reports on historical sales data. Which combination of Azure services should they use to meet both real-time and batch processing requirements?

A.Azure Stream Analytics for real-time processing and Azure Synapse Analytics for batch analytics

B.Azure Data Factory for both real-time and batch processing

C.Azure Logic Apps for real-time processing and Azure Synapse Analytics for batch analytics

D.Azure Stream Analytics for both real-time and batch processing

AnswerA

Stream Analytics handles real-time insights, Synapse Analytics handles large-scale batch queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time data streaming and can process sales transactions as they occur to detect fraud patterns and block suspicious orders immediately. Azure Synapse Analytics provides a unified analytics platform that can run large-scale batch queries on historical sales data for daily reports, making this combination ideal for both real-time and batch processing needs.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's orchestration capabilities with real-time processing, or assume that a single service like Stream Analytics can handle both streaming and batch analytics, when in fact each service is specialized for a distinct workload type.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory is an orchestration and ETL service for data movement and transformation, not a real-time stream processing engine; it cannot process transactions as they occur with sub-second latency. Option C is wrong because Azure Logic Apps is designed for workflow automation and integration, not for high-throughput, low-latency real-time stream analytics required for fraud detection. Option D is wrong because Azure Stream Analytics is optimized for real-time stream processing and does not natively support batch analytics on historical data; it lacks the SQL-based analytical engine and large-scale query capabilities of a dedicated batch analytics service like Synapse.

Practice this question →

165

MCQhard

A retail company ingests daily sales data from multiple stores as CSV files stored in Azure Blob Storage. The data must be cleaned and transformed using Spark, then loaded into Azure Synapse Analytics for large-scale reporting. The pipeline must run on a schedule, handle failures with retries, and minimize manual intervention. Which combination of Azure services should they use to orchestrate and execute this pipeline?

A.Azure Data Factory, Azure Databricks, and Azure Synapse Analytics.

B.Azure Stream Analytics, Azure Data Lake Storage, and Power BI.

C.Azure Functions, Azure SQL Database, and Azure Analysis Services.

D.Azure Logic Apps, Azure HDInsight, and Azure Cosmos DB.

AnswerA

Correct. ADF orchestrates the pipeline, Databricks performs Spark-based transformations, and Synapse Analytics serves as the data warehouse for reporting.

Why this answer

Option A is correct because Azure Data Factory provides the orchestration and scheduling layer, Azure Databricks executes the Spark-based cleaning and transformation, and Azure Synapse Analytics serves as the target data warehouse for large-scale reporting. This combination supports retry policies for failure handling and minimizes manual intervention through automated pipeline execution.

Exam trap

The trap here is that candidates may confuse Azure Databricks with HDInsight or overlook the need for a dedicated orchestration service like Data Factory, assuming that a compute service alone can handle scheduling and retries.

How to eliminate wrong answers

Option B is wrong because Azure Stream Analytics is designed for real-time stream processing, not batch CSV ingestion and Spark transformations, and Power BI is a visualization tool, not a data transformation or orchestration service. Option C is wrong because Azure Functions is a serverless compute service unsuitable for complex Spark transformations, Azure SQL Database lacks the large-scale analytics capabilities of Synapse, and Azure Analysis Services is for semantic modeling, not data ingestion or transformation. Option D is wrong because Azure Logic Apps is a workflow automation tool for simple integrations, not robust pipeline orchestration with retries, Azure HDInsight is a managed Hadoop/Spark service but lacks the integrated orchestration and scheduling of Data Factory, and Azure Cosmos DB is a NoSQL database not designed for large-scale reporting workloads like Synapse Analytics.

Practice this question →

166

Multi-Selectmedium

A data engineering team is building a data pipeline to run daily batch loads from an on-premises SQL Server to Azure Synapse Analytics. The pipeline must include data transformation using a visual interface with no coding, and must support schema mapping and data validation. Which THREE Azure services should be used together?

Select 3 answers

A.Azure Synapse Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Blob Storage

E.Azure Analysis Services

AnswersA, B, D

Synapse is the target serving layer for analytics.

Why this answer

Azure Synapse Analytics is the correct destination for the pipeline because it is a cloud-based data warehouse that supports high-performance analytics on large-scale data, making it ideal for daily batch loads from SQL Server. It integrates natively with Azure Data Factory for orchestration and Azure Blob Storage for staging, enabling schema mapping and data validation through visual interfaces without coding.

Exam trap

The trap here is that candidates often assume Azure Databricks is required for transformations, but the question explicitly requires a visual interface with no coding, which Azure Data Factory's Mapping Data Flows provide, not Databricks' notebook-based approach.

Practice this question →

167

MCQeasy

A data analyst needs to create a real-time dashboard in Power BI that refreshes every second from an Azure Stream Analytics job. Which Power BI feature should they use?

A.Scheduled refresh

B.Streaming dataset

C.DirectQuery

D.Import mode

AnswerB

Streaming datasets handle real-time data.

Why this answer

Option B is correct: Power BI streaming datasets support real-time refresh from Stream Analytics. Option A is wrong because scheduled refresh is for periodic data. Option C is wrong because DirectQuery is for interactive queries, not real-time streaming.

Option D is wrong because import mode is for static data.

Practice this question →

168

MCQmedium

A data analyst needs to run ad-hoc SQL queries on terabytes of CSV files stored in Azure Data Lake Storage Gen2. The queries are infrequent and unpredictable. The analyst wants to pay only for the amount of data processed by each query, and does not want to manage any compute or storage infrastructure. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure Data Factory

C.Azure Synapse Serverless SQL pool

D.Azure Analysis Services

AnswerC

Serverless SQL pool allows on-demand SQL querying of data in the data lake, paying only for the data processed per query, with zero infrastructure management.

Why this answer

Azure Synapse Serverless SQL pool (C) is the correct choice because it allows querying data in Azure Data Lake Storage Gen2 using T-SQL without provisioning any compute resources. It charges per terabyte of data processed, making it ideal for infrequent, unpredictable ad-hoc queries, and it eliminates infrastructure management.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure Synapse Analytics dedicated SQL pool, assuming both require provisioning compute, but the serverless option is specifically designed for on-demand, pay-per-query workloads with no infrastructure management.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and managing dedicated compute resources, incurring costs even when idle, which contradicts the pay-per-query and no-management requirements. Option B is wrong because Azure Data Factory is an orchestration and ETL service, not a SQL query engine; it cannot run ad-hoc SQL queries directly on CSV files in Data Lake Storage Gen2. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and pre-aggregated data, not designed for direct querying of raw CSV files with T-SQL, and it requires managing a dedicated server instance.

Practice this question →

169

MCQeasy

A company uses Azure Synapse Analytics to run large-scale batch processing jobs every night. The jobs currently take 6 hours to complete, but the business requires completion within 4 hours. Which action should the company take to improve job performance?

A.Replace PolyBase with Azure Data Factory for data movement.

B.Migrate from serverless SQL pool to dedicated SQL pool.

C.Move the underlying data to Azure Data Lake Storage Gen2.

D.Increase the data warehouse units (DWUs) for the dedicated SQL pool.

AnswerD

Scaling up DWUs allocates more compute resources, reducing job duration.

Why this answer

Increasing the data warehouse units (DWUs) for the dedicated SQL pool scales the compute resources (CPU, memory, and I/O bandwidth) available to the Synapse SQL pool. This directly reduces the execution time of batch processing jobs by allowing more parallel processing, enabling the 6-hour job to complete within the required 4-hour window.

Exam trap

The trap here is that candidates confuse storage optimization (e.g., moving to ADLS Gen2) with compute scaling, or assume that changing data movement tools (PolyBase vs. Data Factory) will fix performance, when the core issue is insufficient compute capacity for the batch workload.

How to eliminate wrong answers

Option A is wrong because replacing PolyBase with Azure Data Factory does not inherently improve query performance; PolyBase is used for high-performance data loading and querying external data, while Data Factory is an orchestration tool—neither addresses the compute bottleneck causing the slow batch jobs. Option B is wrong because migrating from serverless SQL pool to dedicated SQL pool would change the architecture but does not guarantee faster performance without scaling; serverless SQL pool is designed for ad-hoc queries and small-scale processing, not for large-scale batch jobs that require dedicated, scalable compute resources. Option C is wrong because moving data to Azure Data Lake Storage Gen2 improves storage performance and scalability but does not directly accelerate query execution within Synapse Analytics; the bottleneck is compute capacity, not storage location.

Practice this question →

170

MCQmedium

A company wants to analyze customer feedback from surveys and social media. The data includes both structured (ratings) and unstructured (comments) text. They plan to use Azure Cognitive Services for sentiment analysis. Which service should they use for text analytics?

A.Azure Synapse Analytics

B.Azure Cosmos DB

C.Azure AI Language

D.Azure Machine Learning

AnswerC

Provides pre-built sentiment analysis for text.

Why this answer

Azure AI Language (formerly part of Azure Cognitive Services) provides pre-built text analytics capabilities, including sentiment analysis, key phrase extraction, and language detection. This service is specifically designed to process unstructured text data like survey comments and social media posts, making it the correct choice for analyzing customer feedback.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics or Azure Machine Learning as general-purpose analytics tools, overlooking that Azure AI Language is the dedicated, pre-built service for text analytics tasks like sentiment analysis.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is a big data analytics platform for data warehousing and data integration, not a service for performing sentiment analysis on text. Option B is wrong because Azure Cosmos DB is a NoSQL database service for storing and querying structured and semi-structured data, not a text analytics service. Option D is wrong because Azure Machine Learning is a platform for building, training, and deploying custom machine learning models, which is overkill and not the pre-built service designed for sentiment analysis.

Practice this question →

171

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?

A.Round-robin distribution with no partitioning

B.Hash-distribute on ProductID with partitioning on Date

C.Replicate the fact table on all distributions and partition on ProductID

D.Hash-distribute on Date with partitioning on ProductID

AnswerB

Hash-distribution on ProductID co-locates rows with the same ProductID, enabling efficient joins with the product dimension. Partitioning on the Date column enables partition elimination for date range queries, reducing the amount of data scanned.

Why this answer

Hash-distributing the fact table on ProductID ensures that rows for the same product are co-located on the same distribution, minimizing data movement when joining with the product dimension table. Partitioning on Date allows partition elimination for date-range filters, reducing the amount of data scanned. This combination directly addresses the query pattern of date-range filtering and product joins.

Exam trap

The trap here is that candidates often confuse the roles of distribution and partitioning, thinking that partitioning on the join key (ProductID) will improve join performance, when in fact hash distribution on the join key is what co-locates data for joins, while partitioning on the filter column (Date) enables partition elimination.

How to eliminate wrong answers

Option A is wrong because round-robin distribution distributes rows evenly without any logical grouping, causing all joins to require data shuffling across distributions, which is highly inefficient for large fact tables. Option C is wrong because replicating a billion-row fact table on all distributions would consume excessive storage and memory, and partitioning on ProductID does not help with date-range filtering. Option D is wrong because hash-distributing on Date scatters rows for the same product across distributions, forcing data movement during the join with the product dimension table, and partitioning on ProductID does not enable partition elimination for date-range filters.

Practice this question →

172

MCQhard

A retail company uses Azure Data Lake Storage Gen2 to store raw clickstream data. They need to process this data using Azure Databricks to create hourly aggregated reports. The data pipeline must minimize costs while meeting a five-minute processing SLA. What is the most cost-effective compute option?

A.Use interactive clusters with autoscaling

B.Use job clusters with pool-based allocation

C.Use Azure Synapse Serverless SQL pools

D.Provision a dedicated SQL pool in Azure Synapse

AnswerB

Job clusters with pools reduce startup latency and cost, ideal for scheduled jobs.

Why this answer

Job clusters with pool-based allocation are the most cost-effective compute option for this scenario because job clusters are ephemeral—they start only when a job runs and terminate automatically after completion, eliminating idle costs. Pool-based allocation further reduces startup latency by maintaining a warm pool of pre-initialized VMs, enabling the pipeline to meet the five-minute SLA without paying for always-on compute.

Exam trap

The trap here is that candidates often confuse interactive clusters (always-on, for exploration) with job clusters (ephemeral, for automation), and assume that any 'pool' feature increases cost rather than reducing it, leading them to incorrectly select interactive clusters with autoscaling as the cheaper option.

How to eliminate wrong answers

Option A is wrong because interactive clusters are designed for ad-hoc exploration and remain running until manually terminated, incurring continuous costs even when idle, which contradicts the cost-minimization requirement. Option C is wrong because Azure Synapse Serverless SQL pools are a query engine for data lakes, not a compute option for running Databricks jobs; they cannot execute Databricks notebooks or Spark transformations. Option D is wrong because provisioning a dedicated SQL pool in Azure Synapse is a provisioned, always-on resource that incurs high fixed costs and is not designed for Databricks-based processing, making it unsuitable for a cost-sensitive, batch-oriented pipeline.

Practice this question →

173

MCQmedium

A data analyst needs to create a real-time dashboard in Power BI that displays sales data from an Azure SQL Database. The dashboard must update every 10 minutes without manual refresh. Which Power BI feature should they use?

A.DirectQuery mode

B.Scheduled refresh with Import mode

C.Streaming datasets

D.Paginated reports

AnswerA

DirectQuery queries the source directly, enabling near real-time updates.

Why this answer

DirectQuery mode is correct because it allows Power BI to query the Azure SQL Database directly for each visual interaction, ensuring the dashboard reflects the latest data without requiring a manual refresh. Since the requirement is for updates every 10 minutes, DirectQuery can be configured to auto-refresh at that interval via the 'Automatic page refresh' setting, which sends T-SQL queries to the database on a timer. This avoids the latency and storage overhead of importing data, making it ideal for near-real-time monitoring.

Exam trap

The trap here is that candidates confuse 'real-time dashboard' with 'streaming datasets' (Option C), but streaming datasets require a push-based architecture, not a pull from an existing database like Azure SQL Database, which DirectQuery handles natively.

How to eliminate wrong answers

Option B (Scheduled refresh with Import mode) is wrong because Import mode requires a scheduled refresh (minimum 30 minutes for shared capacity, 1 minute for Premium) and stores a copy of the data in Power BI, which introduces latency and does not support sub-10-minute updates without Premium. Option C (Streaming datasets) is wrong because streaming datasets are designed for real-time data ingestion from sources like Azure Stream Analytics or IoT Hub, not for querying an existing Azure SQL Database; they require pushing data via the Power BI REST API, not pulling from a database. Option D (Paginated reports) is wrong because paginated reports are intended for pixel-perfect, print-ready layouts (e.g., invoices) and do not support automatic, timer-based dashboard updates; they require manual refresh or subscription-based rendering.

Practice this question →

174

MCQhard

A retail company processes petabytes of sales transaction data stored in Azure Data Lake Storage Gen2. They need to run recurring complex queries that involve large joins and aggregations. The queries must consistently complete within a fixed time window overnight. The company wants predictable performance and costs. Which Azure service should they use?

A.Azure Synapse Analytics Serverless SQL pool

B.Azure Synapse Analytics Dedicated SQL pool

C.Azure SQL Database

D.Azure Analysis Services

AnswerB

Dedicated SQL pool provisions reserved compute resources, enabling consistent query performance and predictable costs for recurring, complex, large-scale analytics workloads like overnight batch processing.

Why this answer

Azure Synapse Analytics Dedicated SQL pool provides reserved, fixed compute resources that ensure predictable performance and cost for recurring complex queries involving large joins and aggregations. It is designed for petabyte-scale data warehousing workloads with consistent SLAs, making it ideal for overnight batch processing within a fixed time window.

Exam trap

The trap here is that candidates confuse serverless SQL pool's flexibility with dedicated SQL pool's predictability, overlooking that serverless is designed for ad-hoc exploration, not consistent, fixed-time batch processing.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics Serverless SQL pool is a pay-per-query service with variable performance that depends on data volume and concurrency, making it unsuitable for predictable, fixed-time workloads. Option C is wrong because Azure SQL Database is a transactional OLTP database not optimized for petabyte-scale analytics or complex, large-scale joins and aggregations. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and in-memory analytics, not a direct query engine for raw petabyte-scale data in Data Lake Storage Gen2.

Practice this question →

175

MCQhard

A retail company uses Azure Data Lake Storage Gen2 as a data lake and Azure Databricks for ETL. They notice that a Spark job reading Parquet files from the data lake fails with an 'Access Denied' error when the job runs as a service principal. The service principal has Storage Blob Data Contributor role on the storage account. What is the most likely cause?

A.The service principal needs the Storage Blob Data Owner role instead.

B.The service principal lacks the necessary permissions on the Data Lake Storage Gen2 endpoint (dfs.core.windows.net).

C.The storage account has a firewall enabled that blocks the Databricks cluster IP.

D.The service principal does not have the Storage Blob Data Contributor role assigned at the container level.

AnswerB

The 'Access Denied' error occurs when using the DFS endpoint without proper RBAC assignment for that endpoint.

Why this answer

Option B is correct because Azure Data Lake Storage Gen2 uses a hierarchical namespace and a separate endpoint (dfs.core.windows.net) for data plane operations. Even though the service principal has the Storage Blob Data Contributor role (which grants read/write permissions via the blob endpoint), the job may be accessing the data through the DFS endpoint, which requires explicit permissions on that endpoint. The 'Access Denied' error typically occurs when the service principal is not granted the necessary RBAC role at the storage account level for the DFS endpoint, or when the role assignment is not properly propagated.

Exam trap

The trap here is that candidates often assume the Storage Blob Data Contributor role grants full access to all data plane operations, but they overlook that Azure Data Lake Storage Gen2 requires explicit permissions on the DFS endpoint for hierarchical namespace operations, which is a common point of failure in Spark-based ETL jobs.

How to eliminate wrong answers

Option A is wrong because the Storage Blob Data Owner role is not required; the Storage Blob Data Contributor role already provides read/write/delete permissions on data, and the issue is about endpoint-specific access, not role level. Option C is wrong because the question does not mention any firewall configuration, and a firewall blocking the Databricks cluster IP would cause a different error (e.g., network connectivity failure) rather than an 'Access Denied' error specifically tied to the service principal. Option D is wrong because the Storage Blob Data Contributor role is assigned at the storage account level, which by default applies to all containers; the error is not about container-level assignment but about the service principal lacking permissions on the DFS endpoint.

Practice this question →

176

Drag & Drophard

A company is building a modern data warehouse on Azure using a lakehouse approach. Arrange the following steps in the correct order to implement a typical pipeline that starts with raw data ingestion and ends with business reporting.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

The lakehouse pipeline starts with raw data ingestion, followed by transformation and loading into a curated layer, then creating tables for querying, and finally reporting with Power BI.

Practice this question →

177

MCQmedium

A data engineering team needs to build a pipeline that ingests streaming data from IoT devices into Azure Data Lake Storage Gen2. The data arrives as JSON messages. They want to use a service that can capture the streaming data in near real-time and store it as files in the data lake without writing custom code for the ingestion. Which Azure service should they use?

A.Azure Data Factory

B.Azure Event Hubs with Capture

C.Azure Stream Analytics

D.Azure Synapse Pipelines

AnswerB

Event Hubs Capture automatically writes the incoming stream of events to Azure Data Lake Storage in near real-time without any custom code, meeting the requirement exactly.

Why this answer

Azure Event Hubs with Capture is the correct choice because it natively ingests streaming JSON data from IoT devices in near real-time and automatically writes the data to Azure Data Lake Storage Gen2 as files without requiring any custom code. The Capture feature automatically persists the event stream to the specified storage destination at defined time or size intervals, making it ideal for serverless, code-free ingestion.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the primary ingestion service for raw data capture, when in fact Stream Analytics is a processing engine that requires a query and output sink, whereas Event Hubs Capture provides direct, code-free persistence of raw streams.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a code-free ETL orchestration service for batch data movement and transformation, not designed for real-time streaming ingestion from IoT devices. Option C is wrong because Azure Stream Analytics is a real-time analytics and processing engine that requires a query to transform data before output, and it does not natively capture raw streaming data to files without custom code. Option D is wrong because Azure Synapse Pipelines is built on Azure Data Factory and inherits the same batch-oriented orchestration limitations, lacking native real-time streaming capture capabilities.

Practice this question →

178

MCQmedium

Your company runs a sales analytics dashboard on Power BI that refreshes every hour from Azure Synapse Analytics. During peak hours, the dashboard refresh fails with a 'timeout' error. Which action should you take FIRST to resolve the issue?

A.Scale up the Azure Synapse dedicated SQL pool to handle more concurrent queries.

B.Configure the dashboard to use DirectQuery instead of Import mode.

C.Implement incremental refresh in Power BI to refresh only changed data.

D.Export the data to CSV files and load into Power BI from Azure Blob Storage.

AnswerC

Incremental refresh reduces the data volume per refresh, lowering the load and timeout risk.

Why this answer

Option C is correct because implementing incremental refresh in Power BI reduces the amount of data loaded during each refresh cycle, which directly addresses timeout errors by limiting the refresh to only changed or new data rather than the entire dataset. This is the most efficient first step to reduce refresh duration without changing the underlying architecture or data source connection mode.

Exam trap

The trap here is that candidates often assume scaling the source (Option A) or changing the connection mode (Option B) is the immediate fix, but the DP-900 exam emphasizes that incremental refresh is the primary technique to optimize refresh performance for large datasets without altering the underlying infrastructure.

How to eliminate wrong answers

Option A is wrong because scaling up the Azure Synapse dedicated SQL pool increases compute resources for concurrent queries but does not address the root cause of a Power BI refresh timeout, which is typically due to the volume of data being transferred or the complexity of the refresh operation. Option B is wrong because switching to DirectQuery would eliminate the scheduled refresh entirely but would introduce query-time latency and potentially degrade dashboard performance during peak hours, as each visual would query the source directly. Option D is wrong because exporting data to CSV files and loading from Azure Blob Storage adds unnecessary complexity, introduces data staleness, and does not solve the timeout issue—it merely shifts the data movement bottleneck.

Practice this question →

179

MCQmedium

An organization has a data lake that contains both structured and unstructured data. They need to catalog the data assets and enable data discovery for users. Which Azure service should they use?

A.Azure Data Factory

B.Microsoft Purview

C.Azure Synapse Analytics

D.Azure Data Lake Storage

AnswerB

Data governance and cataloging service.

Why this answer

Microsoft Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. It provides automated data discovery, sensitive data classification, and end-to-end data lineage, making it the correct choice for cataloging both structured and unstructured data assets in a data lake and enabling data discovery for users.

Exam trap

The trap here is that candidates often confuse Azure Data Factory’s data movement and transformation capabilities with data cataloging, or they assume Azure Synapse Analytics includes a built-in catalog, when in fact Microsoft Purview is the dedicated service for data discovery and governance.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a data integration and orchestration service used to create, schedule, and manage ETL/ELT pipelines; it does not provide a persistent catalog or data discovery capabilities. Option C is wrong because Azure Synapse Analytics is an analytics service that combines big data and data warehousing, offering querying and processing engines, but it lacks the dedicated data cataloging and governance features needed for asset discovery across a data lake. Option D is wrong because Azure Data Lake Storage is a scalable and secure data lake storage solution for big data analytics; it is the underlying storage layer and does not include cataloging or discovery functionality.

Practice this question →

180

MCQhard

A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?

A.Azure Data Factory

B.Azure Databricks

C.Azure Stream Analytics

D.Azure Synapse Pipelines

AnswerC

Azure Stream Analytics is a serverless real-time analytics service that can ingest data from Event Hubs, perform time-windowed aggregations, clean data, and output to Azure Data Lake Storage in the desired columnar format. It is the most straightforward and cost-effective choice for this streaming ETL scenario.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, directly consuming data from Azure Event Hubs, performing transformations like cleaning invalid records, removing duplicates, and aggregating over tumbling windows (e.g., 5-minute windows), and outputting the results in a columnar format (e.g., Parquet) to Azure Data Lake Storage. This aligns perfectly with the requirement for a low-latency, continuous transformation pipeline without needing additional orchestration or compute clusters.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Synapse Pipelines as suitable for streaming transformations because they see 'pipeline' or 'data movement' keywords, but these services are batch-oriented and cannot perform real-time windowed aggregations directly from Event Hubs.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an orchestration and data movement service for batch workloads, not a real-time stream processing engine; it cannot natively perform windowed aggregations on streaming data from Event Hubs. Option B is wrong because Azure Databricks is a powerful analytics platform that can handle streaming via Structured Streaming, but it requires provisioning and managing a Spark cluster, which is overkill for a simple transformation and loading task that Stream Analytics can handle more cost-effectively and with less operational overhead. Option D is wrong because Azure Synapse Pipelines (now part of Azure Synapse Analytics) is essentially Azure Data Factory's orchestration capabilities within Synapse, inheriting the same batch-oriented limitations and lacking native stream processing for Event Hubs.

Practice this question →

181

Multi-Selecteasy

Which TWO Azure services can be used to store semi-structured data like JSON or Parquet files for analytics? (Choose two.)

Select 2 answers

A.Azure Synapse Analytics

B.Azure Blob Storage

C.Azure SQL Database

D.Azure Data Lake Storage Gen2

E.Azure Cosmos DB

AnswersB, D

Object storage, can store any file type.

Why this answer

Azure Blob Storage is a highly scalable, cost-effective object storage service that can store any type of unstructured data, including semi-structured formats like JSON and Parquet. It is commonly used as a data lake for analytics workloads, where raw or processed data is stored and later queried by services like Azure Synapse Analytics or Azure Databricks.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics (a query service) with a storage service, or think Azure Cosmos DB is suitable for storing large Parquet files for analytics, when it is actually a transactional NoSQL database with a different cost and performance profile.

Practice this question →

182

MCQeasy

A logistics company uses IoT sensors on delivery trucks to transmit GPS location, speed, and engine diagnostics every 10 seconds. The data is ingested into Azure Event Hubs. The company needs to analyze the data in real time to identify speeding trucks and send alerts. The analysis requires joining the live sensor data with a reference table of truck details (e.g., driver name, route number) stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

A.Azure Stream Analytics

B.Azure Synapse Analytics dedicated SQL pool

C.Azure Data Factory

D.Azure Databricks

AnswerA

Stream Analytics is designed for real-time data processing from Event Hubs and can join streams with reference data from SQL Database to enrich and detect patterns like speeding.

Why this answer

Azure Stream Analytics is the correct choice because it is a real-time event processing engine designed to handle streaming data from sources like Azure Event Hubs. It can perform temporal joins between the live IoT sensor stream and a static reference table (e.g., truck details from Azure SQL Database) to enrich the data and trigger alerts when speeding is detected, all with sub-second latency.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Synapse Analytics or Azure Data Factory with real-time processing, or they overcomplicate the solution by choosing Azure Databricks when a simpler, purpose-built service like Stream Analytics is sufficient for the join-and-alert pattern.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse optimized for large-scale batch analytics and complex queries on historical data, not for real-time stream processing with sub-second latency. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service designed for scheduled, batch-oriented data movement and transformation, not for continuous, low-latency stream processing. Option D is wrong because Azure Databricks is a unified analytics platform that can process streaming data using Structured Streaming, but it is overkill for this simple join-and-alert scenario; it requires more complex setup, cluster management, and is not as straightforward as Stream Analytics for directly joining Event Hubs data with Azure SQL reference data.

Practice this question →

183

MCQmedium

A company needs to ingest data from an on-premises SQL Server database into Azure SQL Database every hour. During the ingestion, they need to filter out rows where Status = 'Inactive' and convert a date column to a different format. They want a cloud-based, code-free solution that can schedule and orchestrate this task. Which Azure service should they use?

A.Azure Logic Apps

B.Azure Data Factory with Mapping Data Flows

C.Azure Functions

D.Azure SQL Database Change Data Capture

AnswerB

Azure Data Factory provides mapping data flows, a visual designer for building data transformations at scale. It integrates with on-premises data via self-hosted integration runtime, supports scheduling, and requires no code, making it the ideal choice.

Why this answer

Azure Data Factory with Mapping Data Flows is the correct choice because it provides a cloud-based, code-free ETL service that can ingest data from on-premises SQL Server into Azure SQL Database, apply transformations like filtering rows (Status = 'Inactive') and converting date formats, and schedule the task using triggers. Mapping Data Flows run on Spark clusters and allow visual data transformation without writing code, making it ideal for this orchestrated, scheduled ingestion.

Exam trap

The trap here is that candidates often confuse Azure Logic Apps with Azure Data Factory because both can schedule and orchestrate tasks, but Logic Apps lacks the native data transformation capabilities (like filtering and date conversion) required for ETL workloads, making Data Factory with Mapping Data Flows the correct choice for code-free data transformation.

How to eliminate wrong answers

Option A is wrong because Azure Logic Apps is a workflow automation service that can connect to on-premises SQL Server via the on-premises data gateway, but it lacks native data transformation capabilities for filtering rows and converting date formats within the data flow; it is designed for lightweight integration and orchestration, not for complex ETL transformations. Option C is wrong because Azure Functions is a serverless compute service that requires writing custom code (e.g., C#, Python) to perform the ingestion and transformation, which contradicts the requirement for a code-free solution. Option D is wrong because Azure SQL Database Change Data Capture (CDC) is a feature that tracks changes in a database for incremental data capture, but it does not provide scheduling, orchestration, or transformation capabilities; it is a data capture mechanism, not an ETL or orchestration service.

Practice this question →

184

MCQmedium

Refer to the exhibit. You execute the above T-SQL statements in Azure Synapse Analytics. What is the purpose of this code?

A.To create an external table that can query Parquet files stored in Azure Data Lake Storage Gen2.

B.To create a view over the Parquet files.

C.To import data from Parquet files into a permanent table in Synapse.

D.To create a regular table in the Synapse database.

AnswerA

PolyBase external tables enable querying external data.

Why this answer

The T-SQL code creates an external data source pointing to Azure Data Lake Storage Gen2, an external file format for Parquet, and an external table that references the Parquet files. This allows querying the Parquet files directly without importing them into the database, which is the definition of an external table in Azure Synapse Analytics.

Exam trap

The trap here is that candidates confuse an external table (which reads files in place) with importing data into a permanent table or creating a view, because the syntax resembles regular table creation but includes external source and format clauses.

How to eliminate wrong answers

Option B is wrong because the code creates an external table, not a view; a view is a saved SELECT query that does not define a schema over external files. Option C is wrong because the code does not use CREATE TABLE AS SELECT (CTAS) or INSERT INTO to import data into a permanent table; it only creates an external table that reads the Parquet files on demand. Option D is wrong because the table is defined with an external data source and file format, making it an external table, not a regular (managed) table stored in the Synapse database.

Practice this question →

185

MCQmedium

A retail company needs to analyze streaming clickstream data from their website to detect shopping cart abandonment in real-time. They want to use Azure Stream Analytics to output results that can be visualized on a live dashboard. Which output sink allows the fastest data visualization for a real-time dashboard in Power BI?

A.Azure Blob Storage

B.Azure Event Hubs

C.Power BI dataset

D.Azure SQL Database

AnswerC

Direct output to Power BI dataset enables live dashboards with minimal latency from Stream Analytics.

Why this answer

Power BI dataset is the correct output sink because Azure Stream Analytics can directly stream data into a Power BI dataset via the Power BI output adapter, enabling real-time dashboard updates with sub-second latency. This integration uses the Power BI REST API to push streaming data events, which Power BI then visualizes immediately without requiring intermediate storage or batch processing.

Exam trap

The trap here is that candidates often confuse Azure Event Hubs as a visualization output because it is a streaming service, but Event Hubs is an ingestion endpoint, not a visualization sink; the correct sink for real-time Power BI dashboards is the Power BI dataset output directly from Stream Analytics.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is a batch-oriented, file-based storage service that introduces latency due to write operations and lacks native real-time streaming visualization capabilities; data must be read and processed again before it can be displayed in Power BI. Option B is wrong because Azure Event Hubs is a message ingestion service, not a visualization sink; it can receive streaming data but requires a downstream consumer (like Stream Analytics or a custom application) to forward data to Power BI, adding an extra hop and latency. Option D is wrong because Azure SQL Database is a relational database optimized for transactional workloads and batch inserts; streaming data into SQL Database incurs write latency and row-level locking, and Power BI would need to poll or refresh the dataset, which is not real-time.

Practice this question →

186

MCQeasy

A company needs to run complex analytical queries that aggregate terabytes of sales data across multiple years. The queries are used for monthly business reports and are not latency-sensitive. The data is stored in Azure Data Lake Storage Gen2. The company wants a fully managed, petabyte-scale data warehouse solution that supports SQL queries and integrates with Power BI for reporting. Which Azure service should they use?

A.Azure Synapse Analytics

B.Azure Analysis Services

C.Azure Data Factory

D.Azure HDInsight

AnswerA

Azure Synapse Analytics provides a cloud-based data warehouse that can scale to petabytes. It uses dedicated SQL pools for high-performance analytical queries and has built-in integration with Power BI, Azure Data Lake Storage, and other Azure services.

Why this answer

Azure Synapse Analytics (formerly SQL Data Warehouse) is a fully managed, petabyte-scale analytics service that provides a dedicated SQL pool for running complex, high-performance T-SQL queries against massive datasets. It natively integrates with Azure Data Lake Storage Gen2 for reading data directly via PolyBase or external tables, and it offers built-in connectors to Power BI for reporting. This makes it the ideal choice for the described workload, which requires large-scale aggregation without low-latency demands.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (an OLAP modeling tool) with a data warehouse, or assume HDInsight is suitable for SQL-based reporting, but Synapse is the only fully managed, petabyte-scale SQL data warehouse with native Power BI integration.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and in-memory cubes, not a petabyte-scale data warehouse for raw SQL queries on terabytes of data. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service, not a data warehouse or query engine. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster for big data processing, but it is not a fully managed SQL-based data warehouse and does not provide the same native SQL query experience or direct Power BI integration as Synapse.

Practice this question →

187

Multi-Selecthard

Which THREE components are essential for building a modern data warehouse architecture on Azure?

Select 3 answers

A.Azure Analysis Services

B.Azure Cosmos DB

C.Azure Databricks

D.Azure Data Lake Storage

E.Azure Synapse Analytics

AnswersC, D, E

For data transformation and processing.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform that enables data engineering, data science, and machine learning on large-scale data. In a modern data warehouse architecture, Azure Databricks is used for data transformation, preparation, and advanced analytics, often feeding into or complementing Azure Synapse Analytics for structured querying and reporting.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services as a core component of the data warehouse architecture, when it is actually a downstream BI tool, or mistakenly think Azure Cosmos DB is suitable for analytical workloads, whereas it is optimized for transactional and real-time NoSQL scenarios.

Practice this question →

188

MCQmedium

A retail company analyzes customer purchase patterns. Every night, they run a batch job that aggregates millions of transactions from the past day into summary tables for reporting. Which type of data processing workload best describes this nightly job?

A.Batch processing

B.Real-time processing

C.Streaming processing

D.Transactional processing

AnswerA

Correct because the job runs at a scheduled time to process a large volume of data in batches.

Why this answer

This nightly job processes a large volume of transactions accumulated over the past day in a single, scheduled run, which is the defining characteristic of batch processing. In Azure, this workload would typically be implemented using Azure Synapse Analytics or Azure Data Factory to orchestrate the aggregation of millions of rows into summary tables for reporting, without requiring immediate output.

Exam trap

The trap here is that candidates confuse 'batch processing' with 'transactional processing' because both involve databases, but batch processing is designed for high-volume, scheduled analytics (OLAP), not for real-time, row-by-row operations (OLTP).

How to eliminate wrong answers

Option B is wrong because real-time processing requires data to be ingested and analyzed with sub-second latency, not on a nightly schedule. Option C is wrong because streaming processing continuously processes data as it arrives from sources like IoT devices or clickstreams, not from a static batch of past-day transactions. Option D is wrong because transactional processing (OLTP) handles individual, low-latency CRUD operations on current data, not large-scale aggregations of historical data.

Practice this question →

189

MCQeasy

A data scientist needs to perform exploratory data analysis on a large dataset stored in Azure Data Lake Storage Gen2 using Python notebooks. The solution must minimize infrastructure management. Which Azure service should the data scientist use?

A.Azure Machine Learning compute instances.

B.Power BI with dataflows.

C.Azure HDInsight with Jupyter notebooks.

D.Azure Databricks with collaborative notebooks.

AnswerD

Databricks offers serverless Spark clusters and easy notebook collaboration.

Why this answer

Azure Databricks provides a fully managed, collaborative notebook environment optimized for big data analytics and machine learning. It integrates natively with Azure Data Lake Storage Gen2, allowing the data scientist to perform exploratory data analysis (EDA) using Python notebooks without managing any underlying infrastructure. This minimizes operational overhead while providing autoscaling clusters and built-in Spark capabilities.

Exam trap

The trap here is that candidates often confuse Azure Machine Learning compute instances (which are for ML model development) with a general-purpose data analytics environment, or they assume HDInsight's Jupyter notebooks are equally managed, overlooking the significant infrastructure management overhead and lack of serverless autoscaling.

How to eliminate wrong answers

Option A is wrong because Azure Machine Learning compute instances are designed for training and deploying ML models, not for general-purpose EDA on large datasets; they require manual scaling and lack the native Spark engine needed for efficient processing of data in Data Lake Storage Gen2. Option B is wrong because Power BI with dataflows is a business intelligence and visualization tool, not an interactive coding environment for Python-based EDA; it abstracts away code and cannot run arbitrary Python notebooks. Option C is wrong because Azure HDInsight with Jupyter notebooks requires manual provisioning and management of Hadoop/Spark clusters, contradicting the requirement to minimize infrastructure management; it also lacks the collaborative, serverless notebook experience of Databricks.

Practice this question →

190

MCQmedium

A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure Databricks with interactive clusters

C.Azure Synapse Analytics serverless Spark pool

D.Azure Data Factory with MapReduce

AnswerC

Correct. Serverless Spark pools in Azure Synapse Analytics are fully managed, automatically scale, and charge only for active compute. Ideal for ad-hoc transformations without cluster management.

Why this answer

Azure Synapse Analytics serverless Spark pool is correct because it provides a serverless Apache Spark compute environment that automatically scales based on workload demand and requires no cluster management. This aligns perfectly with the requirement to transform large datasets in Azure Data Lake Storage Gen2 using Python and Spark without provisioning or managing infrastructure.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'interactive clusters' in Azure Databricks, assuming that Databricks offers a serverless option (which it does not for interactive clusters), or they mistakenly think a dedicated SQL pool can run Spark transformations.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool is a provisioned, always-on MPP (Massively Parallel Processing) SQL engine that requires manual scaling and management, not a serverless Spark compute option. Option B is wrong because Azure Databricks with interactive clusters requires manual cluster creation, configuration, and management, and interactive clusters are not serverless—they run continuously until terminated. Option D is wrong because Azure Data Factory with MapReduce is an orchestration and ETL service that uses external MapReduce jobs (e.g., on HDInsight), not a native serverless Spark compute option, and it does not provide direct Python/Spark execution without cluster management.

Practice this question →

191

MCQeasy

Refer to the exhibit. A data engineer is reviewing an ARM template for a storage account. What does the property 'isHnsEnabled' set to true indicate?

A.Blob soft delete is enabled

B.The storage account supports Azure Data Lake Storage Gen2

C.Versioning is enabled for blobs

D.Geo-redundant storage is configured

AnswerB

HNS enables Data Lake Storage Gen2.

Why this answer

The 'isHnsEnabled' property, when set to true, enables the Hierarchical Namespace (HNS) on the storage account. HNS is the core feature that differentiates Azure Data Lake Storage Gen2 from a standard blob storage account, allowing for a true file system hierarchy with POSIX-like access control lists (ACLs). This enables the storage account to support Data Lake Storage Gen2 workloads.

Exam trap

The trap here is that candidates confuse 'isHnsEnabled' with other blob-level features like soft delete or versioning, because all three are often discussed in the context of data protection and management, but only HNS is specific to Data Lake Storage Gen2.

How to eliminate wrong answers

Option A is wrong because blob soft delete is controlled by the 'deleteRetentionPolicy' property on the blob service, not by 'isHnsEnabled'. Option C is wrong because blob versioning is enabled via the 'isVersioningEnabled' property on the blob service, not by 'isHnsEnabled'. Option D is wrong because geo-redundant storage (GRS) is a replication setting configured via the 'sku.name' property (e.g., 'Standard_GRS'), not by 'isHnsEnabled'.

Practice this question →

192

MCQhard

A data analyst needs to run ad-hoc SQL queries on large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the data volume varies. The analyst wants to pay only for the amount of data processed per query and does not want to manage any infrastructure. They also need to create views in T-SQL to simplify queries for Power BI reports. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Data Lake Analytics

C.Azure HDInsight

D.Azure Databricks

AnswerA

Correct. It provides serverless T-SQL querying on data lakes with pay-per-query pricing, supports views, and works with Power BI.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows running ad-hoc T-SQL queries directly on Parquet files in Azure Data Lake Storage Gen2 without provisioning any infrastructure. It uses a pay-per-query model, charging only for the amount of data processed, and supports creating T-SQL views that can be used directly by Power BI for simplified reporting.

Exam trap

The trap here is that candidates may confuse Azure Data Lake Analytics (which also processes data in ADLS Gen2) with a serverless SQL option, but it does not support T-SQL or views, making it unsuitable for the analyst's requirement to create T-SQL views for Power BI.

How to eliminate wrong answers

Option B (Azure Data Lake Analytics) is wrong because it uses U-SQL, not T-SQL, and requires managing a job submission model rather than providing a serverless SQL endpoint for ad-hoc queries. Option C (Azure HDInsight) is wrong because it requires provisioning and managing a Hadoop cluster (infrastructure), and does not offer a serverless pay-per-query model for SQL queries on Parquet files. Option D (Azure Databricks) is wrong because it is primarily a Spark-based analytics platform that requires cluster management and uses Spark SQL or Python, not native T-SQL, and does not support creating T-SQL views for Power BI without additional configuration.

Practice this question →

193

MCQhard

A company has a Power BI dashboard that refreshes daily from an Azure SQL Database. During refresh, the database experiences high CPU usage that impacts transactional applications. They need to minimize impact while keeping the dashboard up-to-date. What should they do?

A.Disable the Query Store in the database

B.Create a read-only user for Power BI

C.Set Power BI to use a lower-capacity license

D.Configure Azure SQL Database with a readable secondary replica

AnswerD

Readable secondaries allow read workloads without impacting the primary.

Why this answer

Configuring Azure SQL Database with a readable secondary replica offloads read-only workloads, such as Power BI refreshes, to the secondary replica. This eliminates CPU contention on the primary replica, protecting transactional applications from performance degradation while keeping the dashboard up-to-date with near-real-time data.

Exam trap

The trap here is that candidates often confuse user permissions (read-only user) with workload isolation, not realizing that read-only replicas are required to physically separate read and write workloads at the infrastructure level.

How to eliminate wrong answers

Option A is wrong because disabling the Query Store does not reduce CPU usage from Power BI refreshes; it only removes query performance insights and may even degrade plan stability. Option B is wrong because creating a read-only user does not change the physical execution of queries—Power BI will still run on the primary replica and consume CPU. Option C is wrong because a lower-capacity Power BI license (e.g., changing from Premium to Pro) affects dataset size limits and refresh frequency, not the database CPU load during refresh.

Practice this question →

194

MCQmedium

You are a data engineer for a financial services company. The company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. They have a fact table named Transactions that contains 2 billion rows. The table is hash-distributed on the AccountID column. Users run reports that aggregate transaction amounts by date and account type. The reports are slow. Upon investigation, you find that the distribution is highly skewed because a few accounts have millions of transactions. You need to improve query performance without redesigning the entire schema. Which action should you take?

A.Change the distribution key to a column with more unique values, such as TransactionID

B.Change the distribution to round-robin

C.Create a clustered columnstore index on the table

D.Replicate the Transactions table to all distributions

AnswerA

A column with high cardinality reduces skew and improves parallel processing.

Why this answer

The correct answer is A because changing the distribution key to TransactionID, which has far more unique values than AccountID, will eliminate the data skew that is causing performance degradation. In a hash-distributed table, a skewed distribution key leads to some distributions holding a disproportionate amount of data, causing parallel query execution to be bottlenecked by the largest distribution. By using a highly unique column like TransactionID, the data will be evenly distributed across all 60 distributions, enabling balanced parallelism and faster aggregation queries.

Exam trap

The trap here is that candidates often assume that a clustered columnstore index (Option C) is the universal fix for slow queries, but they fail to recognize that data skew in a hash-distributed table is a distribution-level problem that columnstore indexes cannot solve.

How to eliminate wrong answers

Option B is wrong because changing to round-robin distribution would distribute rows evenly but would eliminate data collocation benefits, causing all queries that filter or join on AccountID to require data movement across distributions, which would severely degrade performance. Option C is wrong because the table already has a clustered columnstore index (the default for dedicated SQL pool tables), and while columnstore indexes improve compression and scan performance, they do not address the root cause of data skew in a hash-distributed table. Option D is wrong because replicating a 2-billion-row fact table to all distributions would consume excessive storage and cause significant overhead during data loading and maintenance, and it is not a supported or practical action for large fact tables in Azure Synapse Analytics.

Practice this question →

195

Multi-Selecthard

Which THREE components are typically part of a modern data warehouse architecture on Azure? (Choose three.)

Select 3 answers

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Data Lake Storage Gen2

E.Azure Cosmos DB

AnswersA, B, D

Orchestrates data ingestion and transformation pipelines.

Why this answer

Azure Data Factory is correct because it serves as the cloud-based ETL (Extract, Transform, Load) service that orchestrates and automates data movement and transformation across various sources and destinations. In a modern data warehouse architecture, Data Factory is used to ingest raw data from on-premises or cloud sources, transform it using mapping data flows or external compute (e.g., Azure Databricks), and load it into the data warehouse or data lake for analytics. It provides a code-free visual interface or SDK-based control for scheduling and monitoring pipelines, making it essential for the ingestion and preparation layer.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a real-time processing service) with a batch data warehouse component, or mistakenly think Azure Cosmos DB can serve as an analytical data store due to its multi-model capabilities, but it lacks the columnar storage and MPP architecture required for modern data warehousing.

Practice this question →

196

MCQmedium

A marketing team wants to analyze social media sentiment in near real-time. They will use Azure Event Hubs to capture tweets and need to aggregate sentiment scores over 5-minute windows. The aggregated results must be stored in Azure Blob Storage for later analysis. Which Azure service should they use to perform the stream processing?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Synapse Analytics

AnswerA

Azure Stream Analytics is specifically designed for real-time stream processing and can easily ingest from Event Hubs, apply windowed aggregations, and output to Blob Storage.

Why this answer

Azure Stream Analytics is the correct choice because it is a real-time stream processing engine designed to ingest data from sources like Azure Event Hubs, apply temporal aggregations (e.g., 5-minute tumbling windows), and output results directly to Azure Blob Storage. It provides built-in support for windowed functions and exactly-once delivery semantics, making it ideal for near-real-time sentiment analysis without requiring custom code.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Azure Synapse Analytics as stream processing tools, but Data Factory is batch-only and Synapse is primarily a data warehouse, not a real-time stream processor.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data integration service for batch-oriented data movement and orchestration, not designed for real-time stream processing or windowed aggregations. Option C (Azure Databricks) is wrong because while it can process streaming data using Structured Streaming, it requires a Spark cluster to be provisioned and managed, and is overkill for simple windowed aggregations; it is not the simplest or most cost-effective choice for this specific near-real-time scenario. Option D (Azure Synapse Analytics) is wrong because it is a unified analytics platform focused on large-scale data warehousing and big data analytics, not a real-time stream processing service; it can query streaming data via linked services but does not natively perform windowed aggregations on live streams.

Practice this question →

197

MCQhard

A financial services company needs to build a data pipeline that ingests daily transaction files from multiple sources. The pipeline must perform data quality checks, transform data using complex business logic, and load it into Azure Synapse Analytics. The transformations involve conditional branching (e.g., if a transaction amount exceeds a threshold, apply additional validation). The company wants to minimize coding effort and prefers a visual, configuration-based approach. Which Azure service should they use as the primary orchestration and transformation engine?

A.Azure Data Factory with Data Flows

B.Azure Databricks with notebooks

C.Azure Stream Analytics

D.Azure Logic Apps

AnswerA

Correct. ADF Data Flows allow visual, code-free transformations with conditional logic, and ADF handles orchestration.

Why this answer

Azure Data Factory with Data Flows is the correct choice because it provides a visual, configuration-based interface for both orchestration and transformation, including support for complex business logic like conditional branching (e.g., via conditional split transformations). It natively integrates with Azure Synapse Analytics for loading transformed data, minimizing coding effort compared to code-heavy alternatives.

Exam trap

The trap here is that candidates may confuse Azure Data Factory with Azure Databricks, assuming Databricks is required for complex transformations, but Data Flows provide the same Spark power with a visual interface, meeting the 'minimize coding' requirement.

How to eliminate wrong answers

Option B is wrong because Azure Databricks with notebooks requires writing code (Python, Scala, or SQL) for transformations, which contradicts the requirement to minimize coding effort and prefer a visual approach. Option C is wrong because Azure Stream Analytics is designed for real-time streaming data processing, not for batch ingestion of daily transaction files with complex conditional transformations. Option D is wrong because Azure Logic Apps is a workflow automation service for integrating applications and services, not a data transformation engine capable of handling complex business logic or loading into Azure Synapse Analytics.

Practice this question →

198

MCQmedium

A data engineer needs to design a solution for a healthcare organization that must store patient records for 7 years to comply with regulatory requirements. The data will be accessed infrequently after the first year. Which Azure storage tier should be used for data older than one year?

A.Transaction-optimized tier

B.Cool tier

C.Archive tier

D.Hot tier

AnswerB

Cool tier is a low-cost storage tier for data that is infrequently accessed and stored for at least 30 days, ideal for data older than one year.

Why this answer

The Cool tier is designed for data that is accessed infrequently but must be available immediately when needed, with a minimum storage duration of 30 days. Since patient records older than one year are accessed rarely but may still require low-latency retrieval for compliance audits, Cool tier balances cost and accessibility. Hot tier is for frequent access, Archive tier has a retrieval latency of hours (not suitable for occasional access), and Transaction-optimized is not a standard Azure Blob Storage tier.

Exam trap

Microsoft often tests the misconception that Archive tier is the cheapest and therefore always the best for old data, but the trap here is ignoring the retrieval latency and access requirements—Archive is only appropriate if data can tolerate hours of delay for rehydration.

How to eliminate wrong answers

Option A is wrong because Transaction-optimized tier is not a valid Azure Blob Storage access tier; Azure offers Hot, Cool, Cold, and Archive tiers, and 'Transaction-optimized' is a misleading term that does not exist in the service. Option C is wrong because Archive tier has a retrieval time of up to 15 hours (rehydration) and is intended for data that is rarely accessed and can tolerate significant delay, making it unsuitable for records that may need occasional access within a reasonable time. Option D is wrong because Hot tier is optimized for frequent access patterns and higher storage costs, which would be wasteful for data accessed only infrequently after the first year.

Practice this question →

199

MCQhard

A company ingests streaming data from thousands of devices into Azure Event Hubs. They need to transform and aggregate the data in real time before storing it in Azure Data Lake Storage Gen2. Which Azure service should they use between Event Hubs and ADLS Gen2?

A.Azure Functions

B.Azure Databricks

C.Azure Stream Analytics

D.Azure Data Factory

AnswerC

Stream Analytics is purpose-built for real-time stream processing with SQL-like queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time data processing and analytics, allowing you to define SQL-like queries to transform and aggregate streaming data from Event Hubs before outputting it directly to Azure Data Lake Storage Gen2. It provides exactly-once delivery semantics and low-latency processing, making it the ideal service for this ingestion-to-storage pipeline.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Functions or Azure Databricks for real-time processing, but Stream Analytics is the only service that provides a fully managed, low-latency, SQL-based streaming pipeline without requiring custom code or cluster management.

How to eliminate wrong answers

Option A is wrong because Azure Functions is a serverless compute service for event-driven code execution, but it lacks native streaming aggregation capabilities and would require custom code to handle stateful operations like windowed aggregates, leading to increased complexity and potential data loss. Option B is wrong because Azure Databricks is a big data analytics platform that can process streaming data via Structured Streaming, but it is overkill for a simple transform-and-aggregate pipeline and introduces higher latency and operational overhead compared to a dedicated streaming service. Option D is wrong because Azure Data Factory is an ETL and orchestration service designed for batch data movement and transformation, not real-time streaming; it cannot process data as it arrives in Event Hubs with sub-second latency.

Practice this question →

200

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool as its data warehouse. New data is loaded into the warehouse every few minutes. The company wants to visualize the data with near real-time updates in a dashboard that can be refreshed automatically. Which tool and connection mode should they use?

A.Power BI with DirectQuery mode

B.Azure Data Studio with visualizations

C.SQL Server Reporting Services (SSRS) with cached reports

D.Excel Power Pivot with imported data

AnswerA

Correct. DirectQuery sends queries directly to the database, reflecting updates as soon as data is available, enabling near real-time dashboards.

Why this answer

Power BI with DirectQuery mode is correct because it allows the dashboard to query the Azure Synapse dedicated SQL pool directly for each visual refresh, enabling near real-time updates without importing data. This mode avoids the latency of data import and supports automatic page refresh, which aligns with the requirement for data loaded every few minutes.

Exam trap

The trap here is that candidates often confuse DirectQuery with Import mode, assuming Import mode is always faster for dashboards, but Import mode cannot achieve near real-time updates without manual or scheduled refreshes, which fails the 'every few minutes' requirement.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool, not a visualization or dashboard tool; it lacks native automatic refresh capabilities for near real-time dashboards. Option C is wrong because SQL Server Reporting Services (SSRS) with cached reports stores report snapshots, which introduces data staleness and cannot provide near real-time updates. Option D is wrong because Excel Power Pivot with imported data requires manual or scheduled data refresh, which cannot achieve sub-minute near real-time updates and is not designed for automatic dashboard refresh.

Practice this question →

201

MCQeasy

You are a business analyst at a manufacturing company. The company uses Azure SQL Database to store production data. You need to create a Power BI report that shows real-time machine efficiency. The report must refresh every 5 minutes to show current metrics. You have been granted read-only access to the database. The database is under heavy load from transactional applications, and you want to minimize additional impact. Which approach should you take to create the report?

A.Disable the Power BI Query Cache to reduce database load

B.Use Power BI Import mode with incremental refresh policy to load only new data every 5 minutes

C.Configure Power BI to use DirectQuery mode to ensure real-time data

D.Export the data to a CSV file and import it into Power BI daily

AnswerB

Import mode with incremental refresh reduces load by only querying changed data.

Why this answer

B is correct because Power BI Import mode with an incremental refresh policy allows you to load only new or changed data every 5 minutes, minimizing the load on the heavily used Azure SQL Database. This approach avoids repeated full table scans, reduces query impact, and still provides near-real-time metrics for the report.

Exam trap

The trap here is that candidates often confuse DirectQuery with real-time capability, not realizing that DirectQuery sends live queries to the source for every interaction, which would exacerbate database load rather than minimize it.

How to eliminate wrong answers

Option A is wrong because disabling the Power BI Query Cache would increase database load, not reduce it, as every report interaction would require a fresh query against the database. Option C is wrong because DirectQuery mode sends a query to the database for every visual interaction, which would add significant load to the already heavily stressed database and is not suitable for minimizing impact. Option D is wrong because exporting to a CSV file daily does not support the required 5-minute refresh interval and introduces manual steps, making it impractical for real-time machine efficiency monitoring.

Practice this question →

202

MCQmedium

A telecommunications company needs to analyze call detail records (CDRs) to detect fraud patterns and minimize revenue leakage. The data arrives as a continuous stream from network switches and must be queried within seconds of ingestion to flag suspicious activity. The analysts also need to run interactive ad-hoc queries over the last 90 days of CDR data using a Kusto query language. Which Azure service should they use as the primary data store and analytics engine?

A.Azure Data Explorer

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Analysis Services

AnswerA

Correct. Azure Data Explorer is built for real-time analytics on streaming data, with sub-second query performance and native Kusto Query Language support for interactive exploration.

Why this answer

Azure Data Explorer is optimized for high-velocity telemetry data like CDRs, supporting ingestion of continuous streams with sub-second query latency. Its native Kusto Query Language (KQL) enables both real-time fraud detection and interactive ad-hoc queries over large time windows (e.g., 90 days) without pre-aggregation or indexing overhead.

Exam trap

The trap here is that candidates confuse Azure Stream Analytics (real-time processing) with Azure Data Explorer (real-time analytics), failing to recognize that Stream Analytics lacks a native query language for interactive ad-hoc exploration over historical data.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is a distributed SQL-based data warehouse designed for large-scale batch analytics and data integration, not for real-time streaming ingestion with sub-second query latency on raw telemetry. Option C is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks (e.g., storage, databases) but does not provide a native Kusto query interface for interactive ad-hoc queries over historical data. Option D is wrong because Azure Analysis Services is an OLAP engine for pre-aggregated tabular models, not designed for direct ingestion of high-velocity streaming data or Kusto query language support.

Practice this question →

203

MCQmedium

Your company has a Power BI report that uses DirectQuery to Azure SQL Database. Users report that the report is slow when multiple users access it simultaneously. The database is underprovisioned. Which action should you take to improve performance without changing the report design?

A.Enable query caching in Power BI Premium.

B.Change the report to use Import mode instead of DirectQuery.

C.Scale up the Azure SQL Database to a higher service tier.

D.Add indexes to the tables used in the report.

AnswerC

More DTUs/vCores handle more concurrent queries.

Why this answer

Option C is correct because the root cause is an underprovisioned Azure SQL Database, which cannot handle the concurrent query load from multiple Power BI users using DirectQuery. Scaling up to a higher service tier (e.g., from S2 to S3 or a DTU-based tier) increases the database's DTUs (Database Transaction Units), directly improving throughput and reducing query latency without altering the report design.

Exam trap

The trap here is that candidates often confuse performance tuning at the database level (scaling up) with caching or data import strategies, overlooking the explicit constraint that the report design must remain unchanged and that the database is the bottleneck.

How to eliminate wrong answers

Option A is wrong because query caching in Power BI Premium caches results at the Power BI service layer, but with DirectQuery, the report still sends live queries to the database; caching does not address the database's inability to handle concurrent queries efficiently. Option B is wrong because changing to Import mode would require redesigning the report (e.g., setting up refresh schedules, handling data latency), which violates the constraint of not changing the report design. Option D is wrong because adding indexes can improve query performance for specific queries, but it does not resolve the fundamental issue of an underprovisioned database lacking sufficient compute and I/O resources to handle concurrent user load.

Practice this question →

204

MCQmedium

A company uses Azure Data Lake Storage Gen2 to store IoT sensor data. The data is partitioned by date and sensor ID. A data scientist needs to efficiently query only the last 7 days of data for a specific sensor. Which strategy minimizes the amount of data scanned?

A.Use a directory structure that enables partition elimination

B.Create a view that filters on date and sensor ID

C.Read all Parquet files and filter using a WHERE clause

D.Create an index on the date and sensor ID columns

AnswerA

By organizing data in a hierarchical directory structure (e.g., /sensorID=xxx/date=yyyy-mm-dd/), query engines can prune partitions and scan only relevant files.

Why this answer

Option A is correct because Azure Data Lake Storage Gen2 supports hierarchical directory structures that enable partition elimination at the storage layer. By organizing data under a path like `/sensorID=123/date=2025-03-20/`, a query engine (e.g., Azure Synapse Serverless SQL or Spark) can skip entire directories that do not match the filter, drastically reducing the amount of data scanned.

Exam trap

The trap here is that candidates confuse database indexing (Option D) with data lake partitioning, or assume that a WHERE clause alone (Option C) is sufficient to minimize data scanned, not realizing that partition elimination requires a physical directory structure.

How to eliminate wrong answers

Option B is wrong because a view is merely a saved query definition; it does not physically reorganize data or skip partitions, so it still scans all underlying files. Option C is wrong because reading all Parquet files and then applying a WHERE clause forces a full scan of every file, even though Parquet supports predicate pushdown—without partition elimination, the engine must still open and read metadata from all files. Option D is wrong because Azure Data Lake Storage Gen2 does not support traditional database indexes; indexing is a relational database concept and cannot be applied to files in a data lake.

Practice this question →

205

MCQmedium

A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Analysis Services

D.Azure Data Lake Analytics

AnswerB

Azure Stream Analytics is the correct service for real-time stream processing of IoT data, with built-in support for windowing, aggregations, and anomaly detection.

Why this answer

Azure Stream Analytics is a serverless, real-time stream processing engine designed to handle high-velocity data from sources like IoT devices. It can ingest data from Azure Event Hubs or IoT Hub, apply SQL-based queries to detect anomalies in near real-time, and output results directly to Azure Data Lake Storage for long-term analytics. This makes it the correct choice for the described near-real-time anomaly detection and alerting requirement.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's ability to copy data from streaming sources (like Event Hubs) with actual stream processing, failing to recognize that Data Factory lacks the real-time query and windowing capabilities required for anomaly detection.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an orchestration and data integration service for batch and scheduled data movement, not a real-time stream processing engine; it cannot process streaming data with sub-second latency. Option C is wrong because Azure Analysis Services is an analytical engine for creating semantic models and performing business intelligence queries on pre-processed data, not for ingesting or processing raw streaming data. Option D is wrong because Azure Data Lake Analytics is a batch analytics service that uses U-SQL to process large datasets in Data Lake Storage, but it does not support real-time stream processing or event-driven anomaly detection.

Practice this question →

206

MCQmedium

An organization runs a mission-critical analytics workload on Azure Synapse Analytics. They need to ensure high availability and automatic failover in case of a regional outage. Which configuration should they implement?

A.Enable read-scale replicas

B.Configure automatic backups with point-in-time restore

C.Use active geo-replication for the dedicated SQL pool

D.Enable always-on availability groups

AnswerC

Active geo-replication provides automatic failover to a secondary region.

Why this answer

Option C is correct because active geo-replication for a dedicated SQL pool in Azure Synapse Analytics creates a readable secondary database in a paired Azure region, enabling automatic failover during a regional outage. This ensures high availability and business continuity for mission-critical analytics workloads by maintaining a synchronized replica that can be promoted to primary with minimal downtime.

Exam trap

The trap here is that candidates often confuse read-scale replicas (Option A) or always-on availability groups (Option D) with geo-replication, but Azure Synapse Analytics dedicated SQL pools do not support those features and rely specifically on active geo-replication for regional failover.

How to eliminate wrong answers

Option A is wrong because read-scale replicas are designed to offload read-only workloads from the primary database, not to provide automatic failover or high availability during a regional outage. Option B is wrong because automatic backups with point-in-time restore allow recovery from data corruption or accidental deletion, but they do not provide automatic failover or continuous availability during a regional outage; restoring from backup incurs significant downtime. Option D is wrong because always-on availability groups are a SQL Server feature for on-premises or IaaS deployments, not supported for Azure Synapse Analytics dedicated SQL pools, which use a different architecture based on Azure SQL Database's geo-replication.

Practice this question →

207

MCQeasy

Your company needs to build an analytics solution that can handle both batch and streaming data from IoT devices. The solution must allow complex event processing and real-time dashboards. Which Azure service should you use as the primary data ingestion and processing layer?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Synapse Analytics

AnswerB

Stream Analytics is built for real-time analytics on streaming data.

Why this answer

Azure Stream Analytics is the correct choice because it is designed specifically for real-time data ingestion and processing, supporting both batch and streaming data from IoT devices. It enables complex event processing (CEP) through SQL-like queries and can output directly to real-time dashboards in Power BI, meeting the requirement for both batch and streaming analytics.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Azure Databricks, mistakenly thinking that batch and streaming can be handled by a single general-purpose ETL tool, but Stream Analytics is the only service natively optimized for real-time complex event processing and direct dashboard integration.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an ETL and orchestration service for batch data movement and transformation, not designed for real-time streaming or complex event processing. Option C is wrong because Azure Databricks is a big data analytics platform that can handle both batch and streaming, but it is not the primary service for real-time dashboards and complex event processing; it requires additional configuration and is more suited for advanced analytics and machine learning. Option D is wrong because Azure Synapse Analytics is a unified analytics service that integrates data warehousing and big data analytics, but its primary strength is in large-scale batch processing and data warehousing, not real-time streaming ingestion and complex event processing.

Practice this question →

208

MCQeasy

A small business wants to start using Azure for analytics. They have a few CSV files stored on-premises that they want to analyze. They have no budget for complex infrastructure and prefer a fully managed, serverless solution. They need to create interactive visualizations and share them with their team. The data does not change frequently, so they are okay with daily refreshes. Which of the following options should they choose? A) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Databricks to create a data processing pipeline, and then use Power BI to visualize the results. B) Upload the CSV files to Azure Blob Storage, use Azure Data Factory to load the data into Azure SQL Database, and then use Power BI to connect and visualize. C) Upload the CSV files to OneDrive for Business, use Power BI Desktop to import the data, and publish to Power BI Service with scheduled refresh. D) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool to query the data, and then use Power BI to connect. Which option is the simplest and most cost-effective?

A.Option B

B.Option D

C.Option C

D.Option A

AnswerC

Simplest, uses Power BI Desktop with OneDrive, no extra Azure services.

Why this answer

Option C is correct because it uses OneDrive for Business as a simple storage location, Power BI Desktop for importing CSV data, and Power BI Service for publishing and sharing interactive visualizations with scheduled daily refresh. This is fully managed, serverless, and requires no complex infrastructure, aligning perfectly with the small business's budget and simplicity requirements.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing Azure-specific storage and compute services (like Data Lake, Databricks, or Synapse) when a simpler, fully managed tool like Power BI with OneDrive is sufficient and more cost-effective for small-scale, static data analytics.

How to eliminate wrong answers

Option A is wrong because it involves Azure Databricks, which is a complex, cluster-based data processing platform that introduces significant cost and management overhead, far beyond the needs of a small business with simple CSV files and daily refreshes. Option B is wrong because it uses Azure Data Factory and Azure SQL Database, which are over-engineered for static CSV data; Data Factory adds pipeline complexity and SQL Database incurs ongoing costs, contradicting the 'no budget for complex infrastructure' requirement. Option D is wrong because Azure Synapse Serverless SQL pool, while serverless, is designed for large-scale analytics and requires creating external tables and managing permissions, adding unnecessary complexity for a few CSV files that could be handled more directly with Power BI.

Practice this question →

209

MCQeasy

A retail company runs a nightly process that reads all sales transactions from the previous day, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload best describes this nightly process?

A.Online Transaction Processing (OLTP)

B.Batch processing

C.Stream processing

D.Data warehousing

AnswerB

Batch processing is the correct classification because the job processes a large volume of accumulated data at a scheduled time (nightly).

Why this answer

The nightly process reads all sales transactions from the previous day, aggregates them, and writes summary results into a data warehouse. This is a classic batch processing workload because data is collected over a period (the entire previous day), processed in a single offline job, and the output is stored for later reporting. Batch processing is ideal for high-volume, non-real-time transformations like nightly ETL (Extract, Transform, Load) jobs.

Exam trap

The trap here is that candidates confuse the destination (data warehousing) with the processing workload, or mistake a scheduled nightly aggregation for stream processing because they see 'data' and 'processing' without recognizing the batch window.

How to eliminate wrong answers

Option A is wrong because OLTP is designed for real-time, low-latency transaction processing (e.g., inserting individual sales records as they occur), not for aggregating large volumes of historical data in a nightly job. Option C is wrong because stream processing handles continuous, real-time data flows with sub-second latency (e.g., Apache Kafka or Azure Stream Analytics), whereas this process runs once per day on a fixed batch of data. Option D is wrong because data warehousing is the destination or storage system for the aggregated results, not the processing workload itself; the nightly process is a batch ETL job that feeds the data warehouse.

Practice this question →

210

MCQhard

A data engineering team is designing a modern data warehouse on Azure. They have raw data landing in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files. They need to perform transformations using Apache Spark, and then load the transformed data into Azure Synapse Analytics for high-performance analytical queries. The team wants to use a single orchestration service to schedule, monitor, and manage the entire pipeline. Which Azure service should they choose for orchestration?

A.Azure Data Factory

B.Azure Databricks

C.Azure Logic Apps

D.Azure Data Lake Analytics

AnswerA

Data Factory orchestrates data movement and transformation, supporting both Spark and Synapse workloads.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and orchestration service designed to schedule, monitor, and manage data pipelines at scale. It natively supports triggers (e.g., time-based, event-based) and can orchestrate Apache Spark transformations via Azure Databricks or HDInsight, then load the transformed data into Azure Synapse Analytics using built-in copy activities or pipelines. ADF provides a single pane of glass for end-to-end pipeline management, including dependency handling and error monitoring.

Exam trap

The trap here is that candidates may confuse Azure Databricks (a compute/transform service) with an orchestration tool, but the question explicitly asks for a service to 'schedule, monitor, and manage the entire pipeline,' which is the core function of Azure Data Factory, not Databricks.

How to eliminate wrong answers

Option B (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform for data transformation and machine learning, not a dedicated orchestration service; it lacks native scheduling and monitoring capabilities for multi-step pipelines across heterogeneous services. Option C (Azure Logic Apps) is wrong because it is designed for workflow automation and integration across SaaS applications using connectors, not for orchestrating big data ETL pipelines with Spark and Synapse Analytics; it does not support native Spark execution or large-scale data movement. Option D (Azure Data Lake Analytics) is wrong because it is a deprecated service for distributed analytics using U-SQL, not an orchestration tool; it cannot schedule or manage pipelines that involve ADLS Gen2, Spark, and Synapse Analytics.

Practice this question →

211

MCQhard

You run the above Kusto query in Azure Data Explorer. What does the query return?

A.The total number of tornado events across all states.

B.The top 5 states with the most tornado events.

C.The top 5 years with the most tornadoes.

D.The number of events by type.

AnswerB

Correct: count by state, order desc, top 5.

Why this answer

The Kusto query uses the `summarize` operator to count events per state, then `top 5 by count_` to return the five states with the highest event counts. The `project` operator ensures only the state and count columns are shown, making option B correct.

Exam trap

The trap here is that candidates may confuse the grouping column (State) with other columns like EventType or Year, leading them to select options that describe aggregations by those columns instead of by state.

How to eliminate wrong answers

Option A is wrong because the query groups by state and returns only the top 5 states, not a total across all states. Option C is wrong because the query groups by `State`, not by `Year`, so it cannot return top years. Option D is wrong because the query does not group by event type; it groups by state and counts all events regardless of type.

Practice this question →

212

MCQhard

Refer to the exhibit. A data engineer runs the PowerShell script shown. What is the purpose of this script?

A.List all blobs in the container

B.Copy blobs to another container

C.List blobs modified in the last 7 days

D.Delete old blobs from the container

AnswerC

Filters blobs by LastModified within 7 days.

Why this answer

The script uses `Get-AzStorageBlob` with the `-Prefix` parameter to filter blobs by name, then applies a `Where-Object` filter to select only blobs whose `LastModified` property is greater than or equal to 7 days ago. This effectively lists blobs modified in the last 7 days. The script does not perform any copy or delete operations, and it does not list all blobs without filtering.

Exam trap

The trap here is that candidates see `Get-AzStorageBlob` and assume it lists all blobs (option A), overlooking the `Where-Object` filter that restricts results to only recently modified blobs.

How to eliminate wrong answers

Option A is wrong because the script includes a `Where-Object` filter on `LastModified`, so it does not list all blobs — it only returns blobs modified within the last 7 days. Option B is wrong because the script contains no `Start-AzStorageBlobCopy` or any copy cmdlet; it only retrieves blob properties and filters them. Option D is wrong because the script does not call `Remove-AzStorageBlob` or any deletion cmdlet; it only reads and filters blob metadata without modifying storage.

Practice this question →

213

MCQmedium

You are a data engineer for a large e-commerce company. The company uses Azure Data Lake Storage Gen2 to store customer transaction data. They also use Azure Databricks for data transformation and Azure Synapse Serverless SQL pool for ad-hoc queries. Recently, the data lake has grown to 10 TB, and query performance in Synapse Serverless has degraded significantly. Users complain that queries that used to take seconds now take minutes. You need to improve query performance without moving data to a dedicated SQL pool. The data is stored in Parquet format, partitioned by date. You notice that the queries often filter on CustomerID and Date. Current queries scan all partitions even when only a few days are needed. What is the most effective solution to improve performance?

A.Create materialized views in the Serverless SQL database on the partitioned data

B.Convert all Parquet files to CSV and use row-level security to limit data access

C.Repartition the Parquet files by both date and CustomerID, and optimize file sizes to 1 GB each

D.Increase the service level of the Synapse workspace to improve query concurrency

AnswerC

Partition pruning will eliminate irrelevant partitions, reducing data scanned.

Why this answer

Option C is correct because repartitioning the Parquet files by both date and CustomerID enables partition pruning in Azure Synapse Serverless SQL pool. When queries filter on CustomerID and Date, the engine can skip irrelevant partitions entirely, drastically reducing the amount of data scanned. Optimizing file sizes to around 1 GB ensures efficient parallelism and avoids the overhead of many small files, which degrades performance in a serverless environment.

Exam trap

The trap here is that candidates may think materialized views (Option A) or scaling up (Option D) can fix performance issues caused by poor data partitioning, but they overlook that serverless SQL pools rely heavily on data layout and partition pruning for efficient query execution.

How to eliminate wrong answers

Option A is wrong because materialized views in Serverless SQL pool are pre-computed aggregations that can speed up certain queries, but they do not address the root cause of scanning all partitions; the underlying data layout remains unchanged, so queries that filter on CustomerID and Date would still scan unnecessary partitions unless the view itself is partitioned, which is not supported. Option B is wrong because converting Parquet to CSV would increase storage size and query cost (CSV is not columnar), and row-level security only controls access, not performance; it would actually worsen query performance due to lack of compression and predicate pushdown. Option D is wrong because increasing the service level (e.g., changing the Synapse workspace tier) improves concurrency and resource allocation but does not change the data layout or partition pruning; queries would still scan all partitions, so the performance gain is marginal and does not solve the fundamental issue.

Practice this question →

214

MCQhard

A retail chain needs to blend two data sources for a near real-time dashboard: daily batch files from store systems (CSV files on Azure Blob Storage updated once per day) and live web clickstream data from Azure Event Hubs. The dashboard must refresh every 5 minutes with combined data. Which combination of Azure services should be used to ingest and process both data types most efficiently?

A.A) Azure Data Factory + Azure Analysis Services

B.B) Azure Stream Analytics + Power BI

C.C) Azure Synapse Pipelines + Azure Stream Analytics

D.D) Azure Databricks + Azure Data Lake Storage

AnswerC

Synapse Pipelines can orchestrate batch ingestion from Blob Storage, and Stream Analytics can process streaming data from Event Hubs, both feeding into Synapse SQL for combined queries.

Why this answer

Option C is correct because Azure Synapse Pipelines can orchestrate the daily batch CSV files from Azure Blob Storage, while Azure Stream Analytics processes the live web clickstream data from Azure Event Hubs in near real-time. Together, they enable a combined data pipeline that refreshes every 5 minutes, meeting the dashboard's latency requirement efficiently.

Exam trap

The trap here is that candidates often assume Power BI alone can handle both batch and streaming ingestion, but it lacks native batch file ingestion from Blob Storage and requires a separate processing service like Stream Analytics for real-time data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a batch-oriented ETL service that cannot handle live streaming data from Event Hubs, and Azure Analysis Services is a semantic modeling layer that does not ingest or process raw streaming data. Option B is wrong because while Azure Stream Analytics can process the clickstream data, Power BI alone cannot ingest and blend the daily batch CSV files from Blob Storage; it requires a separate ingestion service for batch data. Option D is wrong because Azure Databricks is a big data analytics platform that is overkill for this simple batch-plus-streaming scenario and lacks native integration for near real-time dashboard refresh without additional services, and Azure Data Lake Storage is just a storage layer, not a processing service.

Practice this question →

215

MCQhard

Refer to the exhibit. An administrator runs an Azure CLI command to show the status of a Synapse SQL pool. The output shown is returned. What does this output indicate about the SQL pool?

A.The SQL pool was restored from a backup.

B.The SQL pool is online and available for queries.

C.The SQL pool is paused.

D.The SQL pool is currently being restored.

AnswerB

Both status and provisioning state indicate success.

Why this answer

The Azure CLI command `az synapse sql pool show` returns the provisioning state of the SQL pool. The output shows `"provisioningState": "Online"`, which directly indicates that the pool is provisioned, active, and ready to accept queries. This is the standard status for a running dedicated SQL pool in Azure Synapse Analytics.

Exam trap

The trap here is that candidates may confuse the `provisioningState` field with other status indicators like `status` or `state` in different Azure resources, or assume that `Online` could mean a restore completed successfully, but the output explicitly shows the current state, not the history of operations.

How to eliminate wrong answers

Option A is wrong because the output does not contain any fields such as `restorePointInTime`, `sourceDatabaseId`, or `recovery` that would indicate a restore operation; the `Online` state simply means the pool is running. Option C is wrong because a paused SQL pool would show `"provisioningState": "Paused"` or `"status": "Paused"` in the output, not `Online`. Option D is wrong because a pool being restored would show a provisioning state of `"Restoring"` or `"InProgress"`, not `Online`.

Practice this question →

216

MCQeasy

A company uses Azure Synapse Analytics to run large-scale analytics on sales data. They need to ensure that the workload can automatically scale based on demand without manual intervention. What feature should they configure?

A.Auto-pause and auto-resume

B.Read Scale-out replicas

C.Elastic Database Pools

D.Manual scale-up during peak hours

AnswerA

Enables automatic scaling and cost optimization by pausing when idle and resuming on demand.

Why this answer

Auto-pause and auto-resume is the correct feature because Azure Synapse Analytics (dedicated SQL pool) supports automatic scaling through the ability to pause the compute resources when idle and resume them on demand, which effectively scales the workload based on demand without manual intervention. This feature reduces costs by stopping compute billing during inactivity and automatically resumes when a query or activity is detected, meeting the requirement for automatic scaling.

Exam trap

The trap here is that candidates confuse auto-pause/auto-resume with manual scaling or other database scaling features (like read replicas or elastic pools) that are specific to Azure SQL Database, not Azure Synapse Analytics.

How to eliminate wrong answers

Option B is wrong because Read Scale-out replicas are a feature of Azure SQL Database (and Azure SQL Managed Instance) that offloads read-only workloads to a replica, not a mechanism for automatic scaling of compute resources in Azure Synapse Analytics. Option C is wrong because Elastic Database Pools are a shared resource model for managing multiple Azure SQL Databases with variable usage patterns, not a feature of Azure Synapse Analytics for scaling a single analytics workload. Option D is wrong because manual scale-up during peak hours requires human intervention, which directly contradicts the requirement for automatic scaling without manual intervention.

Practice this question →

217

MCQmedium

A retail company collects sales data from multiple stores. Data is ingested into Azure Data Lake Storage Gen2 as CSV files. The data team needs to run ad-hoc SQL queries on this data without moving it, and they want to pay only for the amount of data processed. They also need to integrate with Power BI for visualization. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Data Lake Analytics

D.Azure Synapse Serverless SQL pool

AnswerD

Serverless SQL pool enables querying files in Data Lake with standard T-SQL, paying only for data processed, and integrates with Power BI.

Why this answer

Azure Synapse Serverless SQL pool (option D) is correct because it allows querying data directly from Azure Data Lake Storage Gen2 using T-SQL without moving the data, and it uses a pay-per-query model where you are billed only for the amount of data processed. It also integrates seamlessly with Power BI for visualization, making it ideal for ad-hoc SQL queries on CSV files.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure Synapse Analytics dedicated SQL pool, mistakenly thinking both require provisioning and pay for compute, or they overlook that Azure Data Lake Analytics is deprecated and not the correct service for ad-hoc SQL queries on data lakes.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and paying for dedicated compute resources (even when idle), and it is designed for large-scale data warehousing with persistent storage, not for ad-hoc pay-per-query scenarios on existing data lakes. Option B is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported and stored within it, and it does not support querying data directly from Azure Data Lake Storage Gen2 without moving it. Option C is wrong because Azure Data Lake Analytics uses U-SQL (a combination of SQL and C#) and is a separate analytics service that processes data in a data lake, but it is not a SQL-based query service for ad-hoc queries and does not offer the same pay-per-query model as Serverless SQL pool; it has been deprecated in favor of Azure Synapse Serverless SQL pool.

Practice this question →

218

MCQhard

A retail company ingests clickstream data from its e-commerce website into Azure Event Hubs. They need to detect customer journey patterns in real time within seconds and also prepare aggregated data for daily trend reports stored in Azure Data Lake Storage Gen2. The real-time processing must handle high throughput and support complex temporal queries like sessionization. The daily aggregation should be cost-effective and use serverless compute. Which combination of Azure services should they use?

A.Azure Stream Analytics for real-time processing and Azure Data Factory for daily batch aggregation

B.Azure Functions for real-time processing and Azure Databricks for daily batch aggregation

C.Azure Stream Analytics for real-time processing and Azure Batch for daily batch aggregation

D.Azure Data Lake Analytics for real-time processing and Azure Data Factory for daily batch aggregation

AnswerA

Correct. Stream Analytics handles real-time complex event processing. Data Factory can orchestrate serverless batch transformations and load data into Data Lake Storage.

Why this answer

Azure Stream Analytics is ideal for real-time processing of high-throughput clickstream data from Event Hubs, supporting complex temporal queries like sessionization with low latency (seconds). Azure Data Factory provides cost-effective, serverless orchestration for daily batch aggregation, efficiently moving and transforming data to Azure Data Lake Storage Gen2 without managing infrastructure.

Exam trap

The trap here is confusing Azure Functions (serverless compute) with Azure Stream Analytics (dedicated stream processing) for real-time analytics, and assuming Azure Batch (parallel job execution) is equivalent to Azure Data Factory (orchestrated data integration) for batch aggregation, leading candidates to overlook the specific requirements for high-throughput temporal queries and serverless cost-effectiveness.

How to eliminate wrong answers

Option B is wrong because Azure Functions is not designed for high-throughput, stateful real-time stream processing with complex temporal queries like sessionization; it's better suited for event-driven, short-lived tasks. Option C is wrong because Azure Batch is a job scheduler for parallel compute workloads, not a serverless data integration service for daily batch aggregation; it requires managing compute pools and lacks the built-in data movement and transformation capabilities of Data Factory. Option D is wrong because Azure Data Lake Analytics is deprecated and not suitable for real-time processing; it was designed for batch U-SQL jobs, not low-latency stream processing, and lacks native Event Hubs input support.

Practice this question →

219

Drag & Dropmedium

Drag and drop the steps to perform a point-in-time restore of an Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Point-in-time restore uses automated backups to recover a database to a specific time within the retention period.

Practice this question →

220

MCQmedium

A data engineering team needs to analyze petabytes of historical sales data stored in Azure Data Lake Storage Gen2. They require the ability to run complex SQL queries that join multiple tables and need high performance. The solution must separate compute from storage to allow independent scaling of resources. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Cosmos DB

D.Azure Table Storage

AnswerA

Correct. It is built for petabyte-scale data warehousing with separate compute and storage, ideal for complex analytical queries.

Why this answer

Azure Synapse Analytics dedicated SQL pool is designed for petabyte-scale data warehousing, providing massively parallel processing (MPP) to run complex SQL queries across multiple tables with high performance. It separates compute from storage, allowing independent scaling of compute resources without moving data, which aligns with the requirement for decoupled scaling.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's familiar SQL interface with the ability to handle petabyte-scale analytics, overlooking the fundamental architectural difference between OLTP and MPP data warehouse systems.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a relational database service for OLTP workloads, not designed for petabyte-scale analytics or independent compute-storage separation. Option C is wrong because Azure Cosmos DB is a NoSQL database optimized for low-latency, globally distributed applications, not for complex SQL joins on petabytes of historical data. Option D is wrong because Azure Table Storage is a key-value NoSQL store for semi-structured data, lacking SQL query capabilities and MPP architecture for large-scale analytics.

Practice this question →

221

MCQmedium

A manufacturing company collects temperature and vibration data from thousands of sensors. The data is streamed to Azure Event Hubs. The company wants to store all this raw data in Azure Data Lake Storage Gen2 for future batch analytics. They need a solution that automatically writes the streaming data to the data lake in near real-time, without requiring any custom code for the write operation. Which Azure feature should they use?

A.Azure Stream Analytics job output to Azure Data Lake Storage Gen2

B.Azure Event Hubs Capture

C.Azure Data Factory Copy Activity

D.Azure Synapse Pipelines

AnswerB

Event Hubs Capture automatically captures streaming data into Azure Blob Storage or Azure Data Lake Storage Gen2 without any custom code. It writes data in Avro format and is ideal for long-term storage and batch analytics.

Why this answer

Azure Event Hubs Capture is the correct choice because it automatically writes streaming data from Event Hubs to Azure Data Lake Storage Gen2 in near real-time without requiring any custom code. It integrates directly with Event Hubs to buffer and write data in Avro format, meeting the requirement for a no-code, automated solution.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the only way to output Event Hubs data to storage, overlooking Event Hubs Capture which provides a simpler, code-free alternative for raw data persistence.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics requires a job definition and query logic to output to Data Lake Storage Gen2, which involves custom code (SQL-like queries) and is not a fully automatic write operation without configuration. Option C is wrong because Azure Data Factory Copy Activity is a batch-oriented data movement tool that requires scheduling or triggers to copy data, not a near real-time streaming solution, and it does not natively integrate with Event Hubs for continuous streaming. Option D is wrong because Azure Synapse Pipelines are designed for orchestration and ETL in a Synapse workspace, not for automatic, code-free streaming writes from Event Hubs to Data Lake Storage Gen2.

Practice this question →

222

MCQhard

A data warehouse team uses Azure Synapse Analytics dedicated SQL pool to serve both business executives running weekly reports and data scientists running complex ad-hoc queries on large fact tables. The ad-hoc queries often consume excessive resources and degrade performance for the weekly reports. The team needs to ensure that the weekly reports always get guaranteed resources regardless of other concurrent queries. Which Synapse feature should they use?

A.Workload classification

B.Result set caching

C.Materialized views

D.Columnstore indexes

AnswerA

Correct. Workload classification assigns queries to workload groups with resource limits, ensuring critical queries get guaranteed resources.

Why this answer

Workload classification in Azure Synapse Analytics dedicated SQL pool allows the team to assign incoming queries to specific workload groups with predefined resource allocations. By classifying the weekly report queries into a group with guaranteed minimum resources (e.g., using `CREATE WORKLOAD CLASSIFIER` with `IMPORTANCE` and `REQUEST_MIN_RESOURCE_PERCENT`), the team ensures those queries always receive the necessary resources, even when ad-hoc data scientist queries are running concurrently. This directly addresses the need for predictable performance for critical reports.

Exam trap

The trap here is that candidates often confuse performance optimization features (like caching, materialized views, or indexes) with resource governance features, mistakenly believing that making queries faster inherently guarantees resource availability, whereas workload classification is the only option that provides explicit resource isolation and guarantees.

How to eliminate wrong answers

Option B (Result set caching) is wrong because it only caches query results for repeated executions, which does not guarantee resources for the weekly reports; it can improve performance for identical queries but does not prevent resource contention. Option C (Materialized views) is wrong because they pre-compute and store aggregated data to speed up queries, but they do not provide resource guarantees or isolation; they can be used alongside workload management but are not a solution for resource contention. Option D (Columnstore indexes) is wrong because they improve compression and query performance for large fact tables by using columnar storage, but they do not allocate or guarantee resources for specific workloads; they are a storage optimization, not a resource management feature.

Practice this question →

223

MCQhard

A company is migrating their on-premises data warehouse, which is built on a Netezza appliance, to Azure. The data warehouse contains over 10 terabytes of data and supports complex BI queries with multiple joins and aggregations. The company requires a cloud-based solution that provides massively parallel processing (MPP) to handle large-scale queries efficiently. They also need to integrate with existing ETL tools like Azure Data Factory and provide native connectivity to Power BI. Which Azure service should they choose?

A.Azure SQL Database

B.Azure Databricks

C.Azure Synapse Analytics dedicated SQL pool

D.Azure HDInsight

AnswerC

Dedicated SQL pool provides MPP capabilities for fast query performance on large datasets. It is designed for cloud data warehousing and seamlessly integrates with Azure Data Factory and Power BI.

Why this answer

Azure Synapse Analytics dedicated SQL pool is the correct choice because it provides massively parallel processing (MPP) architecture designed for petabyte-scale data warehousing, exactly matching the 10+ TB requirement. It natively integrates with Azure Data Factory for ETL and offers built-in Power BI connectivity via the T-SQL endpoint, supporting complex BI queries with multiple joins and aggregations.

Exam trap

The trap here is that candidates often confuse Azure Databricks (a Spark-based analytics platform) with a data warehouse, overlooking that Synapse dedicated SQL pool is the only option that provides native MPP, T-SQL support, and direct Power BI connectivity for large-scale BI workloads.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a single-node, general-purpose relational database that lacks MPP architecture and cannot efficiently handle 10+ TB of complex BI queries with massive parallelism. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform optimized for data engineering and machine learning, not a dedicated data warehouse; it does not provide native T-SQL-based MPP for large-scale BI workloads and requires additional configuration for Power BI connectivity. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service focused on big data processing and batch analytics, not a dedicated MPP data warehouse; it lacks native T-SQL support and optimized query execution for complex BI joins and aggregations, and its Power BI integration is indirect through Spark connectors.

Practice this question →

224

MCQhard

A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?

A.Azure Event Hubs, Azure Stream Analytics, Azure Synapse Analytics

B.Azure IoT Hub, Azure Data Factory, Azure SQL Data Warehouse

C.Azure Event Hubs, Azure Stream Analytics, Azure Cosmos DB

D.Azure Blob Storage, Azure Databricks, Azure Table Storage

AnswerA

Event Hubs ingests data in real-time. Stream Analytics processes the stream to detect events and transform data. Synapse Analytics provides a columnar data warehouse for historical analysis. This combination fits the requirements exactly.

Why this answer

Azure Event Hubs is designed for high-throughput, low-latency ingestion of streaming data from millions of IoT devices. Azure Stream Analytics provides a SQL-based, near real-time processing engine to detect critical events like abnormal heart rates. Azure Synapse Analytics (formerly SQL Data Warehouse) offers a columnar storage format (e.g., columnstore indexes) optimized for historical analysis and SQL-based reporting by data analysts.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub with Event Hubs for high-volume event ingestion, or assume Cosmos DB is suitable for columnar analytics storage, but IoT Hub is for device management and Cosmos DB is row-oriented NoSQL, not optimized for SQL-based historical reporting.

How to eliminate wrong answers

Option B is wrong because Azure IoT Hub is primarily for bidirectional device management and communication, not optimized for high-throughput event ingestion; Azure Data Factory is a batch-oriented ETL service, not a real-time stream processor; and Azure SQL Data Warehouse (now part of Synapse) is correct for storage but the ingestion and processing components are mismatched. Option C is wrong because Azure Cosmos DB is a NoSQL document database with row-oriented storage, not a columnar format suitable for historical SQL analytics and reporting. Option D is wrong because Azure Blob Storage is object storage without native streaming ingestion; Azure Databricks is a big data analytics platform but not a dedicated near real-time stream processor like Stream Analytics; and Azure Table Storage is a NoSQL key-value store, not columnar or SQL-queryable for analysts.

Practice this question →

225

MCQhard

A financial services company runs critical end-of-day reports in an Azure Synapse Analytics dedicated SQL pool. These reports require guaranteed resource allocation and must complete within a fixed time window. However, ad-hoc analytical queries from data scientists often consume resources, causing contention and delaying the critical reports. Which feature should the company implement to ensure the critical reports always receive sufficient resources?

A.A. Create a workload group for the critical reports with a high importance setting and assign a minimum percentage of resources.

B.B. Enable result set caching on all queries to reduce execution time.

C.C. Implement materialized views for the aggregations used in the critical reports.

D.D. Use hash distribution for the fact tables to improve query parallelism.

AnswerA

Correct. Workload groups with importance and resource allocation ensure that critical queries get priority and guaranteed resources, preventing ad-hoc queries from starving them.

Why this answer

Option A is correct because workload groups in Azure Synapse Analytics dedicated SQL pool allow you to assign a minimum percentage of resources (e.g., CPU and memory) to a specific workload, ensuring guaranteed resource allocation. By setting high importance for the critical reports, the system prioritizes them over ad-hoc queries, preventing resource contention and ensuring they complete within the fixed time window.

Exam trap

The trap here is that candidates often confuse performance optimization features (caching, materialized views, distribution) with resource governance, which is the only mechanism to guarantee resource allocation and priority in a shared environment.

How to eliminate wrong answers

Option B is wrong because result set caching only reduces execution time for repeated queries by storing results, but it does not guarantee resource allocation or prevent contention from concurrent ad-hoc queries. Option C is wrong because materialized views improve query performance by pre-computing aggregations, but they do not provide resource isolation or priority for critical workloads. Option D is wrong because hash distribution improves parallelism for large fact tables, but it does not address resource contention or guarantee resource allocation for specific queries.

Practice this question →