CCNA Monitor Optimize Questions

38 questions · Monitor Optimize topic · All types, answers revealed

1
MCQhard

You are monitoring an Azure Data Lake Storage Gen2 account using Metrics and Audit logs. You notice that the 'Ingress' metric shows a sudden spike but the 'Egress' metric remains stable. There are no new storage events in the audit log. What is the most likely cause?

A.The storage account is configured with geo-redundant storage (GRS) and data is being replicated to the secondary region.
B.A Spark job is reading large amounts of data in parallel.
C.An Azure Data Factory pipeline is writing intermediate results to the storage account.
D.An Azure Function is triggered by blob creation events and writes logs to the same account.
AnswerC

Writes increase ingress, and if the pipeline is using staging or intermediate storage, it may not log each write as a separate storage event.

Why this answer

Option C is correct because an Azure Data Factory pipeline writing intermediate results to the storage account would cause a spike in 'Ingress' (data written into the account) without a corresponding increase in 'Egress' (data read from the account). The absence of new storage events in the audit log suggests the writes are not triggering blob-level events (e.g., BlobCreated events), which is consistent with Data Factory writing intermediate files using the Azure Blob Storage REST API or SDK without enabling event grid notifications for those specific operations.

Exam trap

The trap here is that candidates confuse 'Ingress' with 'Egress' or assume any write operation must generate a storage event, but Azure Storage events are opt-in and not all write operations (e.g., Data Factory intermediate writes) are configured to emit them.

How to eliminate wrong answers

Option A is wrong because geo-redundant storage (GRS) replication is asynchronous and occurs at the storage system level, not as user-visible 'Ingress' or 'Egress' metrics; replication traffic is internal and does not appear in the account's ingress/egress metrics. Option B is wrong because a Spark job reading data in parallel would increase 'Egress' (data read from storage), not 'Ingress', and the question states Egress remains stable. Option D is wrong because an Azure Function triggered by blob creation events would generate new storage events in the audit log (e.g., BlobCreated events), but the question explicitly states there are no new storage events.

2
MCQmedium

A company runs a mission-critical Azure Data Factory pipeline that ingests data every hour from Azure Blob Storage into Azure Synapse Dedicated SQL Pool. Recently, the pipeline has been failing with timeout errors during the copy activity. The source blob files are around 500 MB each. Which configuration change would MOST effectively reduce the likelihood of timeout errors?

A.Decrease the 'Batch size' for the copy activity.
B.Change the sink to use PolyBase with staging enabled.
C.Increase the Data Integration Unit (DIU) to 8.
D.Enable 'Enable staging' and set 'Degree of copy parallelism' to a higher value.
AnswerD

Increases parallelism, reducing copy time and timeout likelihood.

Why this answer

Option D is correct because enabling staging allows the copy activity to use Azure Blob Storage as an intermediate staging area, which breaks the 500 MB files into manageable chunks and uses parallel staging writes to the Dedicated SQL Pool. This reduces the load on the single copy session and prevents timeout errors by leveraging the staging engine's retry and parallelization capabilities.

Exam trap

The trap here is that candidates often assume increasing DIUs or decreasing batch size will solve timeout issues, but they fail to recognize that staging is specifically designed to handle large file transfers by breaking them into parallel chunks and providing built-in retry logic.

How to eliminate wrong answers

Option A is wrong because decreasing 'Batch size' reduces the number of rows per batch, which can increase the number of round trips and actually worsen timeout issues for large files. Option B is wrong because PolyBase with staging enabled is a valid approach for bulk loading, but the question specifically asks for a configuration change to reduce timeout errors; PolyBase itself does not inherently address timeout errors during the copy activity—it is a different loading method that may still encounter timeouts if not combined with staging. Option C is wrong because increasing Data Integration Units (DIU) to 8 only improves parallelism within the copy activity for file splits, but for a single 500 MB file, the copy activity still processes it as one unit unless staging is enabled to split it; DIU alone does not mitigate timeout errors caused by long-running single-file transfers.

3
MCQhard

A company uses Azure Data Lake Storage Gen2 with Azure Databricks. They notice that the job to write data into Delta Lake tables takes too long. The data is coming from a streaming source with a high velocity of small writes. Which approach should be taken to optimize write performance?

A.Configure the streaming to write in micro-batches with a higher trigger interval.
B.Increase the cluster size to 16 nodes.
C.Enable 'auto optimize' and 'optimized writes' on the Delta table.
D.Change the output format from Delta to Parquet.
AnswerA

Batching reduces the number of small file writes.

Why this answer

Option A is correct because increasing the trigger interval for micro-batches reduces the frequency of writes, allowing more data to accumulate per batch. This minimizes the overhead of small file commits and metadata operations in Delta Lake, which is the primary bottleneck for high-velocity streaming writes. By batching more records together, the job writes fewer, larger files, improving overall throughput.

Exam trap

The trap here is that candidates often choose 'auto optimize' and 'optimized writes' (Option C) thinking they solve small file problems proactively, but these features are reactive compaction mechanisms that add overhead and do not reduce the frequency of log commits during streaming.

How to eliminate wrong answers

Option B is wrong because simply increasing cluster size to 16 nodes does not address the root cause of small file overhead; it may even exacerbate the problem by creating more concurrent writers producing even smaller files. Option C is wrong because 'auto optimize' and 'optimized writes' are designed to compact small files after they are written, but they do not prevent the initial overhead of many small writes during streaming; they add post-processing cost and latency. Option D is wrong because changing the output format from Delta to Parquet removes ACID transactions, schema enforcement, and time travel capabilities, and does not solve the small file problem—Parquet still suffers from the same small file overhead without the benefits of Delta Lake.

4
MCQmedium

You are designing a data pipeline that ingests JSON files from Azure Blob Storage into Azure Synapse Analytics using PolyBase. The files contain nested JSON arrays. What should you do to ensure that the data is loaded correctly?

A.Flatten the JSON arrays into a tabular format using Azure Data Factory or Databricks before loading.
B.Create an external table with the JSON file type and use a schema definition.
C.Use the OPENJSON function in T-SQL to parse the JSON during the load.
D.Use PolyBase with a JSON format file specifying the schema.
AnswerA

PolyBase requires tabular data, so flattening is necessary.

Why this answer

Option A is correct because PolyBase in Azure Synapse Analytics cannot directly handle nested JSON arrays; it requires a flat, tabular structure. Azure Data Factory or Databricks can flatten the nested arrays into rows and columns before loading, ensuring compatibility with PolyBase's external table format.

Exam trap

The trap here is that candidates assume PolyBase can handle any JSON structure natively, but it only supports flat JSON files, and they overlook the need for pre-processing nested arrays with tools like Data Factory or Databricks.

How to eliminate wrong answers

Option B is wrong because creating an external table with a JSON file type in PolyBase only works for simple, flat JSON files, not nested arrays; it will fail or produce incorrect results. Option C is wrong because OPENJSON is a T-SQL function used for parsing JSON within a query, but it cannot be used directly in a PolyBase load operation; it would require loading the entire file first, defeating PolyBase's purpose. Option D is wrong because PolyBase does not support a JSON format file for schema specification; it uses format files only for delimited text files (e.g., CSV), not JSON.

5
Drag & Dropmedium

Drag and drop the steps to convert data from CSV to Parquet format using Azure Data Factory into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Define source (CSV) and sink (Parquet) datasets, then a copy activity with mapping, run, and monitor.

6
Multi-Selectmedium

A company uses Azure Synapse Analytics dedicated SQL pool for a data warehouse. They notice that some queries are using more memory than expected, causing resource contention. Which TWO actions should they take to diagnose and optimize memory usage?

Select 2 answers
A.Enable result-set caching.
B.Increase the resource class for the users running the heavy queries.
C.Scale up the DWU setting.
D.Query the sys.dm_pdw_exec_requests DMV to identify queries with high memory grants.
E.Rebuild clustered columnstore indexes.
AnswersB, D

Larger resource classes provide more memory per query.

Why this answer

Option B is correct because increasing the resource class for users running heavy queries allocates more memory to those queries, reducing resource contention by ensuring they have sufficient memory to execute efficiently. Option D is correct because querying sys.dm_pdw_exec_requests DMV allows you to identify queries with high memory grants, which is the first step in diagnosing which queries are consuming excessive memory and need optimization.

Exam trap

The trap here is that candidates often confuse scaling up the DWU (Option C) as a diagnostic action, but it is a reactive scaling measure that does not help identify which queries are causing the memory issue, whereas querying the DMV and adjusting resource classes are targeted diagnostic and optimization steps.

7
Multi-Selecthard

Which THREE factors should you consider when choosing between rowstore and columnstore indexes in Azure Synapse Analytics?

Select 3 answers
A.The table contains many NULL values in indexed columns.
B.The table will be partitioned frequently.
C.The table size is expected to be over 1 TB.
D.The table has a high number of singleton lookups by a primary key.
E.The workload is heavy on aggregations and large scans.
AnswersC, D, E

Columnstore compression is more effective on large tables.

Why this answer

Option C is correct because columnstore indexes in Azure Synapse Analytics are optimized for large-scale data warehousing workloads, where table sizes exceeding 1 TB benefit from high compression and columnar storage, significantly improving scan and aggregation performance. Rowstore indexes, in contrast, are less efficient for such large datasets due to higher I/O and storage overhead.

Exam trap

The trap here is that candidates may mistakenly think NULL handling or partitioning frequency are key differentiators, when in fact the core decision hinges on workload type—aggregations/scans (columnstore) versus singleton lookups (rowstore)—and table size thresholds like 1 TB where columnstore compression becomes critical.

8
MCQeasy

A company uses Azure Data Lake Storage Gen2 to store sensor data. They notice that queries on the data are slow. Which feature should they enable to optimize query performance without moving data?

A.Implement Change Data Capture (CDC).
B.Enable Azure Search on the storage account.
C.Use PolyBase to query the data.
D.Enable hierarchical namespace on the storage account.
AnswerD

Hierarchical namespace organizes data in directories, improving query performance.

Why this answer

Enabling hierarchical namespace on Azure Data Lake Storage Gen2 organizes blobs into a directory hierarchy, which allows query engines like Azure Synapse Analytics and Apache Spark to perform directory-level pruning and partition elimination. This reduces the amount of data scanned during queries, directly improving performance without requiring data movement or restructuring.

Exam trap

The trap here is that candidates often confuse PolyBase (a query engine) with a storage optimization feature, or assume that enabling a search service or CDC will improve query performance on static data, when in fact the hierarchical namespace is the only option that directly optimizes storage layout for faster queries.

How to eliminate wrong answers

Option A is wrong because Change Data Capture (CDC) is a pattern for tracking row-level changes in relational databases (e.g., Azure SQL Database) and does not optimize query performance on static data in Data Lake Storage. Option B is wrong because Azure Search is a cognitive search service for indexing and full-text search over unstructured content, not a query acceleration feature for analytical workloads on Data Lake Storage. Option C is wrong because PolyBase is a data virtualization technology for querying external data sources (e.g., Hadoop, Azure Blob Storage) from SQL Server or Azure Synapse, but it does not enable a performance optimization on the storage account itself; it is a query engine, not a storage-level feature.

9
MCQmedium

You are a data engineer for a financial services company. You manage an Azure Data Lake Storage Gen2 account that stores real-time stock trade data ingested from Azure Event Hubs via Azure Stream Analytics. The data is partitioned by date and symbol. Each day, a downstream Azure Databricks job runs an ETL process to aggregate trades into 5-minute bars and writes the results to a separate container. The Databricks job runs on a cluster with 10 worker nodes (Standard_DS3_v2) using Auto-Scaling enabled (2-10 workers). Recently, the job has been taking longer than expected, and you observe that the cluster is often at 10 workers but still the job duration increased by 30%. The storage account shows high transaction costs. You suspect the issue is related to how data is read. What should you do to optimize the job's performance and reduce costs?

A.Convert the data to Avro format to reduce file size.
B.Increase the maximum number of workers to 20 and use a larger instance type.
C.Modify the Stream Analytics job to output larger files (e.g., set the minimum file size to 100 MB) and use coalesce in Databricks to reduce the number of output partitions.
D.Move the data to Azure Blob Storage Premium tier to reduce latency.
AnswerC

Larger input files reduce metadata overhead, and coalescing reduces output files, improving performance and reducing costs.

Why this answer

Option C is correct because the performance issue stems from reading many small files (small file problem) in Azure Data Lake Storage Gen2, which increases transaction costs and slows down Spark jobs. By configuring Stream Analytics to output larger files (e.g., minimum 100 MB) and using coalesce in Databricks to reduce output partitions, you minimize the number of files read/written, reducing overhead and transaction costs. This directly addresses the root cause—high transaction costs and cluster saturation at 10 workers—without unnecessary scaling or tier changes.

Exam trap

The trap here is that candidates often assume performance issues require scaling up (more workers or larger instances) or changing storage tiers, when the real problem is inefficient data layout (small files) causing excessive I/O and transaction costs.

How to eliminate wrong answers

Option A is wrong because converting to Avro reduces file size but does not solve the small file problem; it may even increase the number of small files if the output is not coalesced, and Avro's compression benefits are marginal for already-compressed data. Option B is wrong because increasing workers to 20 and using larger instances would increase costs without fixing the underlying issue of many small files; the cluster is already at max workers (10) and still slow, indicating a bottleneck in file I/O, not compute capacity. Option D is wrong because moving to Azure Blob Storage Premium tier improves latency but does not reduce the number of transactions or small files; it would increase costs without addressing the root cause of high transaction costs from reading many small files.

10
MCQmedium

You have an Azure Data Lake Storage Gen2 account that stores large volumes of parquet files. A reporting application frequently queries a specific subset of data filtered by a 'region' column. To minimize query latency and cost, which optimization should you implement?

A.Partition the data by region in the folder structure.
B.Create a clustered index on the region column.
C.Compress the parquet files using gzip.
D.Enable hierarchical namespace on the storage account.
AnswerA

Partition elimination reduces data scanned.

Why this answer

Partitioning the data by region in the folder structure (e.g., /region=NorthAmerica/...) enables Azure Data Lake Storage Gen2 and query engines like Azure Synapse or PolyBase to perform partition pruning. This skips scanning irrelevant files entirely, reducing I/O and query latency while lowering cost by minimizing data processed.

Exam trap

The trap here is that candidates confuse compression (Option C) with partitioning, thinking reducing file size alone minimizes I/O, but without partition pruning the engine still scans all files, negating the benefit.

How to eliminate wrong answers

Option B is wrong because clustered indexes are a SQL Server/PaaS feature and are not supported on Parquet files in Azure Data Lake Storage Gen2; they apply only to relational tables in a database. Option C is wrong because compressing Parquet files with gzip does not reduce the amount of data scanned for a filtered query—Parquet already uses column-level compression (e.g., Snappy, ZSTD), and gzip adds CPU overhead without improving partition pruning. Option D is wrong because enabling hierarchical namespace is a prerequisite for folder-based partitioning, not an optimization itself; it must already be enabled to create the partitioned folder structure.

11
MCQmedium

You are a data engineer for a financial services company. You have an Azure Data Lake Storage Gen2 account containing historical trade data organized by date in the format 'yyyy/MM/dd'. Each day's data is stored as a collection of Parquet files. The data is used by a team of analysts who run ad-hoc queries using Azure Synapse Serverless SQL. Recently, the analysts have reported that queries scanning multiple months of data are slow. The storage account uses LRS with a general-purpose v2 tier. You have enabled hierarchical namespace. The data is not partitioned in any other way. You need to improve query performance without moving data or changing the storage tier. What should you do?

A.Create external tables with partition definition using the directory structure and ensure queries filter on the date column.
B.Increase the query timeout setting in Azure Synapse Studio.
C.Redistribute the data using hash distribution on the date column.
D.Increase the data warehouse units (DWU) for the serverless SQL endpoint.
AnswerA

Partition elimination reduces data scanned, improving performance.

Why this answer

Option A is correct because Azure Synapse Serverless SQL can leverage the directory structure of Azure Data Lake Storage Gen2 as virtual partitions. By creating external tables with a partition definition that maps to the 'yyyy/MM/dd' folder hierarchy and ensuring queries filter on the date column, the serverless SQL engine performs partition elimination. This reduces the amount of data scanned, directly addressing the slow performance when querying multiple months of data without moving data or changing the storage tier.

Exam trap

The trap here is that candidates may confuse serverless SQL with dedicated SQL pool concepts, such as hash distribution or DWU scaling, and fail to recognize that partition elimination via external table definitions is the only viable optimization for serverless SQL when data remains in the lake.

How to eliminate wrong answers

Option B is wrong because increasing the query timeout setting in Azure Synapse Studio does not improve query performance; it only allows the query to run longer before failing, which does not address the root cause of slow data scans. Option C is wrong because hash distribution is a concept for dedicated SQL pools (provisioned) in Azure Synapse, not for serverless SQL endpoints; serverless SQL does not support redistributing data with hash distribution, and the data remains in the lake. Option D is wrong because serverless SQL endpoints do not use data warehouse units (DWU); DWU is a scaling metric for dedicated SQL pools, and serverless SQL scales automatically based on the amount of data processed, so increasing DWU is not applicable.

12
Multi-Selectmedium

Which TWO actions should you take to reduce costs associated with an Azure Synapse Dedicated SQL Pool that is used for reporting during business hours only?

Select 2 answers
A.Pause the pool during non-business hours.
B.Enable advanced data compression on all tables.
C.Scale down the pool during business hours.
D.Change the distribution of large tables to ROUND_ROBIN.
E.Implement result set caching for frequently run queries.
AnswersA, E

Stops compute billing when not in use.

Why this answer

Option A is correct because pausing a Dedicated SQL Pool stops billing for compute resources (DWU) while retaining storage costs. Since the pool is only needed for reporting during business hours, pausing it during non-business hours directly eliminates compute charges for that period, which is the most significant cost driver.

Exam trap

Microsoft often tests the distinction between compute cost reduction (pausing/scaling) and storage/performance optimizations (compression, distribution, caching), leading candidates to confuse storage-saving actions with compute-saving actions.

13
MCQhard

A data engineer is monitoring Azure Data Lake Storage Gen2 costs and notices high transaction costs for a specific container. The container stores Parquet files used by Azure Databricks for read-heavy analytics. The files are accessed frequently by multiple jobs. What is the most cost-effective way to reduce transaction costs?

A.Move the data to Azure Blob Storage cool tier.
B.Increase the Parquet file size to maximize block size.
C.Convert the container to Azure Files.
D.Enable Azure CDN to cache the files.
AnswerD

Azure CDN caches data at edge locations, reducing the number of direct read transactions to the storage account.

Why this answer

Option D is correct because enabling Azure CDN caches the frequently accessed Parquet files at edge locations, reducing the number of direct read requests to Azure Data Lake Storage Gen2. This lowers transaction costs (both read and list operations) while maintaining low-latency access for read-heavy analytics workloads. The CDN serves cached content, so the storage account incurs fewer billable transactions.

Exam trap

The trap here is that candidates often assume increasing file size (Option B) reduces costs because fewer files mean fewer transactions, but they overlook that each read of a large file still incurs a single transaction per API call, and transaction costs are per operation, not per file size.

How to eliminate wrong answers

Option A is wrong because moving data to Azure Blob Storage cool tier reduces storage costs but does not reduce transaction costs; in fact, cool tier has higher per-transaction charges, which would increase costs for read-heavy workloads. Option B is wrong because increasing Parquet file size to maximize block size does not reduce transaction costs; Azure Data Lake Storage Gen2 uses hierarchical namespace and transactions are counted per API call (e.g., per file read), not per block size, so larger files reduce the number of files but each read still incurs a transaction. Option C is wrong because converting the container to Azure Files introduces SMB protocol overhead and is designed for file shares, not optimized for read-heavy analytics with Parquet files; it would increase latency and complexity without reducing transaction costs.

14
MCQeasy

A data engineer monitors an Azure Stream Analytics job that processes real-time data. The job is falling behind, and the SU utilization is at 100%. Which action should be taken to improve performance?

A.Increase the number of Streaming Units (SU).
B.Reduce the number of Streaming Units.
C.Change the query compatibility level to 1.0.
D.Deploy a second Stream Analytics job and split the input.
AnswerA

More SU provides more processing power.

Why this answer

When SU utilization reaches 100%, the job is fully saturated and cannot process incoming data fast enough. Increasing the number of Streaming Units (SU) allocates more compute resources (CPU and memory) to the job, allowing it to handle higher throughput and reduce backlog. This is the direct and recommended action for resolving performance bottlenecks caused by insufficient SU capacity.

Exam trap

The trap here is that candidates may think reducing SU or splitting the job is a valid optimization, but the correct response is to increase SU when utilization is at 100%, as this directly addresses the resource bottleneck.

How to eliminate wrong answers

Option B is wrong because reducing the number of Streaming Units would further starve the job of resources, worsening the backlog and increasing latency. Option C is wrong because changing the query compatibility level to 1.0 does not affect resource allocation or throughput; it only alters query language features and behavior, which cannot resolve a 100% SU utilization issue. Option D is wrong because deploying a second Stream Analytics job and splitting the input does not address the root cause of resource saturation; it adds complexity and may cause ordering or partitioning issues without guaranteeing improved performance, and the original job would still be overloaded.

15
MCQeasy

You are analyzing the exhibit from an Azure Monitor metric query for a storage account. What is the primary purpose of this query?

A.To calculate the average number of block blobs in the hot tier.
B.To identify the time period with the highest blob count.
C.To measure the total size of all block blobs in the account.
D.To retrieve the average count of block blobs per hour.
AnswerD

Metric BlobCount with aggregation Average and filter on BlobType equals BlockBlob.

Why this answer

The query uses the 'avg' aggregation on the 'BlobCount' metric, which calculates the average number of blobs over the specified time granularity (e.g., per hour). The result shows the average count of block blobs per hour, not the total count or the count in a specific tier. This aligns with option D, as the query is designed to retrieve the average count of block blobs per hour.

Exam trap

The trap here is that candidates often confuse 'avg' with 'sum' or 'max', leading them to incorrectly think the query calculates total blob count or identifies peak periods, rather than recognizing that 'avg' specifically computes the average value over the time granularity.

How to eliminate wrong answers

Option A is wrong because the query does not filter by blob tier (hot, cool, or archive); it retrieves the average count of all block blobs, not just those in the hot tier. Option B is wrong because the query uses the 'avg' aggregation, which returns an average value over the time period, not the maximum or peak blob count; to identify the time period with the highest blob count, you would need to use the 'max' aggregation. Option C is wrong because the query measures 'BlobCount', which is the number of blobs, not their size; to measure total size, you would use the 'BlobCapacity' metric.

16
MCQmedium

You have an Azure Synapse Analytics dedicated SQL pool. You notice that some queries are taking longer than expected. After reviewing the query plans, you see that some queries are spilling to tempdb. What should you do to reduce tempdb spills?

A.Increase the resource class for the user executing the queries.
B.Redistribute the tables using hash distribution.
C.Rebuild all columnstore indexes.
D.Add partitioning to the tables.
AnswerA

Larger resource class allocates more memory, reducing tempdb spills.

Why this answer

Tempdb spills occur when a query requires more memory than is allocated to it, forcing intermediate results to be written to disk. Increasing the resource class for the user executing the queries allocates more memory to that user's queries, reducing the likelihood of spills. This directly addresses the memory constraint that causes spills in a dedicated SQL pool.

Exam trap

The trap here is that candidates often confuse performance tuning techniques like indexing or partitioning with memory management, assuming any optimization will fix spills, when only increasing memory allocation (via resource class) directly addresses the root cause.

How to eliminate wrong answers

Option B is wrong because redistributing tables using hash distribution improves data movement and join performance but does not directly increase per-query memory allocation to prevent tempdb spills. Option C is wrong because rebuilding columnstore indexes improves compression and scan performance but does not address the memory grant issue that causes spills. Option D is wrong because adding partitioning can improve partition elimination and manageability but does not increase the memory available to individual queries, so it will not reduce tempdb spills.

17
MCQhard

You are reviewing an Azure Policy assignment that uses the above JSON to define a role-based access control (RBAC) action. What is the primary purpose of this policy?

A.To assign RBAC roles to users for the storage account.
B.To enable delegation of access to a specific blob.
C.To allow users to set permissions on storage account containers.
D.To authorize generation of a shared access signature (SAS) token for the storage account.
AnswerD

The 'listAccountSas' action generates an account-level SAS token.

Why this answer

The policy JSON defines a role-based access control (RBAC) action that grants the 'Microsoft.Storage/storageAccounts/listAccountSas/action' permission. This specific action authorizes the generation of a shared access signature (SAS) token at the storage account level, not at the container or blob level. Therefore, the primary purpose is to allow the generation of an account SAS token, which provides delegated access to storage services.

Exam trap

The trap here is that candidates confuse the account-level SAS generation action with container or blob-level delegation, or mistakenly think the policy itself assigns roles rather than defining a permission that can be used in a custom role.

How to eliminate wrong answers

Option A is wrong because the policy does not assign RBAC roles to users; it defines a permission action that can be included in a role definition, not the assignment of roles themselves. Option B is wrong because delegation of access to a specific blob requires a service SAS or user delegation SAS, which uses different actions (e.g., 'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/...'), not the account-level listAccountSas action. Option C is wrong because setting permissions on storage account containers is managed via container-level RBAC actions (e.g., 'Microsoft.Storage/storageAccounts/blobServices/containers/write') or container ACLs, not the account-level SAS generation action.

18
MCQmedium

A company uses Azure Synapse Analytics with dedicated SQL pools. They notice that query performance degrades significantly during peak hours. They have already scaled up the Data Warehouse Units (DWU) to the maximum. Which action should they take next to improve performance?

A.Enable result-set caching.
B.Rebuild all clustered columnstore indexes.
C.Increase the number of concurrency slots.
D.Move the data to Azure Data Lake Storage Gen2.
AnswerA

Result-set caching stores query results in the SSD cache, reducing compute resource usage and improving performance for repeated queries.

Why this answer

When a dedicated SQL pool is already at maximum DWU, further scaling is not possible. Enabling result-set caching stores query results in the SSD-based cache of the SQL pool, allowing repeated queries to be served directly from cache without re-scanning data or re-computing aggregations. This reduces I/O and CPU pressure during peak hours, improving performance for recurring queries without requiring additional compute resources.

Exam trap

The trap here is that candidates often confuse result-set caching with materialized views or index maintenance, assuming that only index rebuilds or scaling can fix performance, but result-set caching is a lightweight, no-cost configuration change that directly addresses repeated query patterns during peak load.

How to eliminate wrong answers

Option B is wrong because rebuilding clustered columnstore indexes is a maintenance task that can improve compression and scan performance, but it does not address the root cause of peak-hour degradation when the pool is already at maximum DWU; it also consumes significant resources during rebuild. Option C is wrong because concurrency slots are a resource governance mechanism that limits the number of concurrent queries, not a performance-tuning feature; increasing concurrency slots would actually reduce the resources available per query, potentially worsening performance. Option D is wrong because moving data to Azure Data Lake Storage Gen2 changes the storage layer but does not directly improve query performance in a dedicated SQL pool; the pool still reads data through its compute nodes, and the bottleneck is compute, not storage location.

19
MCQhard

Your team uses Azure Databricks with Delta Lake for ETL. You notice that the Delta table's version history is growing rapidly, and query performance is degrading. You want to retain the ability to time travel for the last 30 days. Which Delta Lake command should you run?

A.DESCRIBE HISTORY delta_table;
B.VACUUM delta_table RETAIN 30 HOURS;
C.OPTIMIZE delta_table;
D.FSCK REPAIR TABLE delta_table;
AnswerB

VACUUM removes files older than the retention period; 30 hours is a typo but the correct command is VACUUM with retention in hours; default is 7 days. To keep 30 days, set retention to 720 hours.

Why this answer

The VACUUM command in Delta Lake removes files older than the specified retention threshold, which directly addresses the rapid growth of version history and performance degradation. By using `VACUUM delta_table RETAIN 30 HOURS`, you delete stale data files while preserving the last 30 days of history for time travel, as Delta Lake defaults to a 7-day retention period but allows custom retention. This command physically deletes unused files, reducing storage and improving query performance.

Exam trap

The trap here is that candidates confuse VACUUM's retention parameter with days instead of hours, or they mistakenly choose OPTIMIZE thinking it cleans up history, when in fact it only compacts files without removing old versions.

How to eliminate wrong answers

Option A is wrong because DESCRIBE HISTORY only displays the transaction log (version history) metadata; it does not remove any files or improve performance. Option C is wrong because OPTIMIZE compacts small files into larger ones to improve read performance but does not delete old versions or reduce version history growth. Option D is wrong because FSCK REPAIR TABLE is used to recover table metadata after file system changes (e.g., manual file deletions) and does not address version history cleanup or performance degradation.

20
MCQeasy

A company runs a streaming pipeline using Azure Stream Analytics to ingest IoT data and output to Azure SQL Database. They notice that the output latency increases over time and eventually the job fails with a timeout error. What is the most likely cause?

A.The Stream Analytics job has a high late arrival tolerance.
B.The event hub is not partitioned correctly.
C.The event hub consumer group is misconfigured.
D.The Azure SQL Database target table lacks proper indexes.
AnswerD

Missing indexes slow down write operations, causing backpressure and eventual timeout.

Why this answer

The most likely cause is that the Azure SQL Database target table lacks proper indexes. Without indexes, each batch of output from Stream Analytics triggers full table scans for inserts or updates, causing cumulative latency. Over time, the backlog exceeds the job's timeout threshold (default 5 minutes for output), leading to failure.

Exam trap

The trap here is that candidates often attribute output latency to input-side issues like partitioning or consumer groups, but the symptom of increasing latency over time points to a downstream bottleneck, specifically missing indexes on the SQL target table.

How to eliminate wrong answers

Option A is wrong because high late arrival tolerance delays watermark advancement but does not cause progressive output latency or timeouts; it affects event ordering, not throughput. Option B is wrong because incorrect event hub partitioning affects input ingestion parallelism, not output latency to SQL Database; the job would show high input backlog, not output timeout. Option C is wrong because a misconfigured consumer group (e.g., multiple readers) causes checkpoint conflicts or duplicate reads, not a gradual increase in output latency; the job would fail with partition-related errors, not timeout.

21
MCQmedium

Your company runs a critical data pipeline using Azure Data Factory (ADF) that ingests data from multiple sources into an Azure Synapse Analytics dedicated SQL pool. Recently, you have observed that the pipeline frequently fails with the error: 'Operation for target table failed: 'Cannot insert duplicate key row in object 'dbo.FactSales' with unique index 'PK_FactSales'. The duplicate key value is (20241001, 12345).'' The pipeline uses a Copy activity with a stored procedure sink that merges data into the fact table. The fact table has a clustered columnstore index and a unique constraint on (DateKey, ProductKey). You need to modify the pipeline to handle duplicates without losing data and without impacting performance significantly. What should you do?

A.Configure the Copy activity sink to use 'upsert' behavior with the unique key columns.
B.Change the distribution of the fact table to round-robin and remove the unique constraint.
C.Use a staging table and then execute a T-SQL MERGE statement to update or insert.
D.Add a pre-copy script to delete existing rows that match the incoming data before the copy.
AnswerA

ADF's upsert uses the source to update matching rows and insert new ones, avoiding duplicate key violations.

Why this answer

Option A is correct because Azure Data Factory's Copy activity supports native upsert behavior when using a stored procedure sink, allowing it to handle duplicate key violations by updating existing rows instead of failing. By specifying the unique key columns (DateKey, ProductKey) in the upsert configuration, the pipeline can merge incoming data into the fact table without requiring manual staging or pre-cleanup, minimizing performance impact by leveraging the existing clustered columnstore index and unique constraint.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing a manual staging table approach (Option C) or a destructive pre-copy script (Option D), not realizing that ADF's native upsert feature is designed specifically to handle duplicate key violations in a performant and atomic manner.

How to eliminate wrong answers

Option B is wrong because changing the distribution to round-robin and removing the unique constraint would eliminate the duplicate detection mechanism, potentially allowing data integrity issues and degrading query performance due to data movement during joins. Option C is wrong because using a staging table and a T-SQL MERGE statement introduces additional latency and complexity, and while it can handle duplicates, it is less efficient than the native upsert feature in ADF, which is optimized for such scenarios. Option D is wrong because adding a pre-copy script to delete existing rows before the copy is a workaround that can cause data loss (deleting legitimate rows) and does not handle concurrent inserts or updates gracefully, leading to potential race conditions and performance overhead.

22
MCQhard

You are a data engineer for a retail company. The company uses Azure Data Lake Storage Gen2 to store raw transaction data partitioned by date. Each day, a folder is created with the format 'YYYY/MM/DD' containing thousands of small JSON files (each ~10 KB). An Azure Databricks job runs daily to read the previous day's folder, transform the data, and write to a Delta table for reporting. Over time, the job's execution time has increased from 15 minutes to over 2 hours. The job uses a cluster with 4 nodes (each 16 GB memory). Monitoring shows that the job spends most of its time in the 'listing files' stage. Which optimization should you implement to reduce the job duration?

A.Increase the number of nodes in the cluster to 16.
B.Change the output format from JSON to Delta and enable Delta caching.
C.Pre-process the raw data to coalesce small JSON files into larger parquet files (e.g., 256 MB each).
D.Use Azure Data Factory instead of Databricks to copy the raw data.
AnswerC

Reduces the number of files, drastically cutting listing time.

Why this answer

The job spends most of its time in the 'listing files' stage because reading thousands of small JSON files (each ~10 KB) from Azure Data Lake Storage Gen2 incurs high metadata operation overhead. Coalescing these small files into larger Parquet files (e.g., 256 MB each) reduces the number of files that Spark must list and process, dramatically cutting down the listing stage time and improving overall throughput.

Exam trap

The trap here is that candidates often assume scaling the cluster (Option A) will solve any performance issue, but they fail to recognize that metadata operations like file listing are not parallelized across nodes and are limited by the storage account's API limits, not compute resources.

How to eliminate wrong answers

Option A is wrong because increasing the number of nodes to 16 does not address the root cause of high metadata overhead from listing thousands of small files; it would only add more parallelism to a bottleneck that is I/O and metadata-bound, not CPU-bound. Option B is wrong because changing the output format to Delta and enabling Delta caching optimizes the write/read side of the Delta table, but the bottleneck is in the input stage (listing and reading raw JSON files), not in the output stage. Option D is wrong because using Azure Data Factory to copy the raw data does not solve the file listing problem; it would still need to list the same small files and would not transform the data, and it introduces an unnecessary extra service without addressing the core issue of small file overhead.

23
Multi-Selecthard

A data engineer is optimizing an Azure Data Lake Storage Gen2 account used for big data analytics. The account contains billions of small files (under 1 MB). The analytics jobs are slow and cost more than expected. Which THREE actions should the engineer take to improve performance and reduce costs?

Select 3 answers
A.Convert data to columnar file formats such as Parquet.
B.Move data to the cool tier to reduce storage costs.
C.Enable soft delete to protect against accidental deletion.
D.Use blob index tags to partition data logically.
E.Consolidate small files into larger files (e.g., 100 MB or more).
AnswersA, D, E

Columnar formats compress data and allow predicate pushdown, reducing I/O.

Why this answer

Option A is correct because converting data to columnar formats like Parquet reduces the amount of data read during analytics queries, as only the necessary columns are scanned. This significantly improves query performance and lowers I/O costs, especially for big data workloads on Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates may think moving data to a cooler tier or enabling soft delete directly improves performance, when in reality these actions address cost or protection, not the root cause of slow analytics jobs caused by small file overhead.

24
MCQmedium

A data engineer is designing a monitoring solution for Azure Data Factory pipelines. They need to be alerted when a pipeline run fails or when the duration exceeds a threshold. The solution must minimize cost and operational overhead. Which approach should they use?

A.Configure Azure Event Grid to send pipeline run events to Azure Functions for alerting.
B.Use Azure Monitor metrics and activity logs to create alert rules for pipeline failures and duration.
C.Send all pipeline run logs to Log Analytics and create alert rules based on custom log searches.
D.Create an Azure Logic App that runs every minute to check pipeline run status via REST API.
AnswerB

Azure Monitor provides built-in metrics and alerts for Azure Data Factory with minimal cost.

Why this answer

Option B is correct because Azure Monitor provides native, cost-effective alerting for Azure Data Factory pipelines using metrics (e.g., pipeline run duration) and activity logs (e.g., pipeline run failures). This approach requires no additional compute or log ingestion costs, as alerts are configured directly on the resource's monitoring data, minimizing both cost and operational overhead.

Exam trap

The trap here is that candidates over-engineer the solution by choosing event-driven or log-based approaches (A, C, D) when the simplest, most cost-effective native monitoring (Azure Monitor alerts) is available, often forgetting that Data Factory emits metrics and activity logs by default without additional setup.

How to eliminate wrong answers

Option A is wrong because Azure Event Grid with Azure Functions introduces unnecessary complexity and cost (function execution time) for a scenario that can be handled natively by Azure Monitor alerts without custom code. Option C is wrong because sending all pipeline run logs to Log Analytics incurs ingestion and retention costs, and custom log search alerts are more expensive and operationally heavier than using built-in metrics and activity log alerts. Option D is wrong because running a Logic App every minute to poll the REST API creates recurring execution costs and latency, and is an inefficient polling pattern compared to the event-driven, push-based alerting provided by Azure Monitor.

25
MCQhard

A data engineering team uses Azure Stream Analytics to process real-time IoT data. They notice that the job's watermark delay is increasing over time, and the output is falling behind. The input is from Event Hubs with 10 partitions. The job uses a 5-minute hopping window with a 1-minute hop. What is the most likely cause?

A.The hopping window size is too large.
B.The late arrival tolerance is set too high.
C.The job is under-provisioned in terms of Streaming Units (SUs).
D.The Event Hubs partition count does not match the Stream Analytics job's parallelism.
AnswerC

Low SUs cause backpressure, increasing watermark delay.

Why this answer

The increasing watermark delay and falling behind output indicate that the Stream Analytics job cannot keep up with the input throughput. With a 5-minute hopping window (1-minute hop) processing 10 Event Hubs partitions, the job requires sufficient Streaming Units (SUs) to handle the compute load. Under-provisioned SUs cause backpressure, leading to rising watermark delay as the job struggles to process events within the window boundaries.

Exam trap

The trap here is that candidates often confuse watermark delay with configuration issues like window size or late arrival tolerance, but the progressive nature of the delay points directly to resource starvation (SU under-provisioning) rather than a static configuration problem.

How to eliminate wrong answers

Option A is wrong because the hopping window size (5 minutes with 1-minute hop) is a standard temporal window configuration and does not inherently cause watermark delay; larger windows actually reduce computational frequency. Option B is wrong because setting the late arrival tolerance too high would allow more late events to be included, potentially increasing watermark delay, but the question states the delay is increasing over time, which is a symptom of insufficient processing capacity, not a configuration that would cause progressive delay. Option D is wrong because Stream Analytics automatically handles partition alignment with Event Hubs partitions when the job's parallelism is set to 1 (default) or when using the same partition count; mismatched partition counts do not cause increasing watermark delay but may cause uneven data distribution or idle partitions.

26
Matchingmedium

Match each Azure monitoring service to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Collect and analyze telemetry from Azure resources

Query and analyze log data

Numerical data from Azure resources

Interactive analytics on large telemetry datasets

Why these pairings

These services are used for monitoring and diagnostics.

27
Multi-Selecthard

You are monitoring an Azure Data Lake Storage Gen2 account that stores streaming data from IoT devices. You notice that query performance on the data in Parquet format is degrading over time. You need to improve query performance for both current and future data. Which TWO actions should you take?

Select 2 answers
A.Move frequently accessed data to Azure SQL Database.
B.Partition the data by a column commonly used in filter conditions.
C.Convert the Parquet files to Delta Lake format and enable file compaction.
D.Enable soft delete on the storage account to optimize read performance.
E.Migrate the data to Azure NetApp Files for lower latency.
AnswersB, C

Partitioning reduces the amount of data scanned per query.

Why this answer

Partitioning the data by a column commonly used in filter conditions (e.g., date, device ID) enables predicate pushdown in query engines like Azure Synapse or Spark, allowing them to skip irrelevant partitions and scan only the necessary files. This directly addresses the performance degradation by reducing the amount of data read during queries, and it benefits both current and future data when applied consistently.

Exam trap

The trap here is that candidates often confuse data protection features (like soft delete) or storage migration options (like Azure SQL or NetApp Files) with performance optimization techniques, failing to recognize that partitioning and file format optimization are the standard solutions for improving query performance on large-scale Parquet data in a data lake.

28
MCQmedium

You have an Azure Data Factory (ADF) pipeline that runs hourly to ingest data from an on-premises SQL Server into Azure Data Lake Storage Gen2. The pipeline includes a Copy activity that transfers all rows from a source table 'Sales' (approximately 10 million rows) to a Parquet file in the data lake. Recently, you notice that the pipeline runtime has increased from 15 minutes to over an hour. The source database CPU utilization is normal, and the network bandwidth is not saturated. You check ADF monitoring and see high 'Data integration unit' consumption and frequent 'BlobWrite' throttling errors. The storage account is in the same region as the ADF. You need to reduce the pipeline runtime. What should you do?

A.Change the storage account to Premium tier to increase throughput limits.
B.Modify the pipeline to use incremental loads instead of full loads each time.
C.Replace the Copy activity with an Azure Databricks notebook to process the data.
D.Use PolyBase in the Copy activity to load data directly into Azure Synapse Analytics.
AnswerB

Reduces data volume per run, decreasing storage throttling and runtime.

Why this answer

The pipeline runtime has increased due to frequent BlobWrite throttling errors, indicating that the storage account is hitting its write request limits. By modifying the pipeline to use incremental loads instead of full loads each hour, you reduce the volume of data written per execution, which lowers the number of write operations and avoids throttling. This directly addresses the root cause without requiring a storage tier upgrade or a complete architectural change.

Exam trap

The trap here is that candidates often assume throttling errors require a storage tier upgrade (Option A) or a compute change (Option C), when the real solution is to reduce the volume of data written per execution by implementing incremental loading.

How to eliminate wrong answers

Option A is wrong because upgrading to Premium tier increases throughput for block blobs but does not eliminate the fundamental issue of writing 10 million rows every hour; throttling can still occur if the write request rate exceeds the account limits, and the cost increase may not be justified. Option C is wrong because replacing the Copy activity with an Azure Databricks notebook adds complexity and overhead without addressing the storage throttling; the bottleneck is at the sink (BlobWrite), not the compute, and Databricks would still write to the same storage account. Option D is wrong because PolyBase is used for loading data into Azure Synapse Analytics, not for writing to Azure Data Lake Storage Gen2; it does not apply to the current sink and would not resolve the BlobWrite throttling errors.

29
Multi-Selecteasy

Which TWO actions help optimize data storage costs in Azure Data Lake Storage Gen2?

Select 2 answers
A.Enable soft delete for blobs.
B.Enable geo-redundant storage (GRS) for the storage account.
C.Configure lifecycle management policies to move data to cool or archive tiers.
D.Enable encryption at rest using customer-managed keys.
E.Use locally redundant storage (LRS) for temporary data.
AnswersC, E

Lifecycle policies reduce cost by tiering infrequently accessed data.

Why this answer

Option C is correct because Azure Data Lake Storage Gen2 supports lifecycle management policies that automatically transition data to cooler tiers (cool or archive) based on age or usage patterns. Moving infrequently accessed data to lower-cost tiers directly reduces storage costs without manual intervention.

Exam trap

The trap here is that candidates often confuse cost-optimization features (like tiering) with data protection or security features (like soft delete, GRS, or encryption), which serve different purposes and may actually increase costs.

30
MCQhard

You are designing a data processing solution using Azure Databricks with Delta Lake. The data is partitioned by date and ingested daily. You notice that the Delta table has many small files, causing slow read performance. Which strategy should you recommend to optimize the table for faster queries?

A.Run OPTIMIZE on the table to compact small files.
B.Run ZORDER BY on the date column.
C.Run VACUUM to delete old files.
D.Increase the number of partitions by adding a new partition column.
AnswerA

OPTIMIZE merges small files into larger ones.

Why this answer

Option A is correct because running OPTIMIZE on a Delta Lake table compacts many small files into larger ones, reducing the number of files that need to be read during queries. This directly addresses the slow read performance caused by the small file problem, which is common in daily partitioned ingestion. OPTIMIZE uses bin-packing to merge files up to a target size (default 256 MB), improving scan efficiency without changing the data.

Exam trap

The trap here is that candidates may confuse ZORDER BY (which improves data skipping but not file count) with OPTIMIZE (which reduces file count), or mistakenly think VACUUM or adding partitions solves the small file problem, when in fact they either don't address it or make it worse.

How to eliminate wrong answers

Option B is wrong because ZORDER BY is used to colocate related information within files to improve data skipping, but it does not reduce the number of small files; it only reorganizes data within existing files. Option C is wrong because VACUUM removes old, unreferenced files for storage cleanup and compliance, but it does not compact small files or improve read performance. Option D is wrong because increasing the number of partitions (e.g., by adding a new partition column) would create even more small files, worsening the small file problem and degrading read performance further.

31
Multi-Selectmedium

Which TWO actions should you take when monitoring Azure Data Lake Storage Gen2 to detect security threats?

Select 2 answers
A.Use Azure Security Center and Azure Defender for Storage.
B.Enable diagnostic settings for the storage account and send logs to Azure Sentinel.
C.Enable soft delete for blobs to recover from accidental deletions.
D.Configure firewall and virtual network service endpoints.
E.Set up alerting on the 'Transactions' metric.
AnswersA, B

Azure Defender provides threat detection and alerts for storage accounts.

Why this answer

Azure Security Center (now Microsoft Defender for Cloud) with Azure Defender for Storage provides built-in threat detection for Azure Data Lake Storage Gen2, including anomaly detection, malware scanning, and alerts for suspicious activities like unauthorized access or data exfiltration. This is a primary action for detecting security threats because it continuously monitors storage telemetry and applies machine learning to identify potential security incidents.

Exam trap

The trap here is that candidates often confuse data protection features (like soft delete) or network controls (like firewalls) with active threat detection, overlooking that only dedicated security monitoring tools (Azure Security Center/Defender and Sentinel) can identify and alert on security threats in real time.

32
Drag & Dropmedium

Drag and drop the steps to set up Azure Data Lake Storage Gen2 hierarchical namespace for a data lake into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

The storage account must have hierarchical namespace enabled. Then create a container, directories, set permissions, and upload data.

33
MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool. They notice that queries against a large fact table are running slower over time. The table is hash-distributed on a date key and has a clustered columnstore index. Which action should you take to improve query performance?

A.Add a non-clustered index on frequently filtered columns.
B.Change the distribution column to a column with higher cardinality.
C.Change the distribution to round-robin.
D.Rebuild the clustered columnstore index.
AnswerD

Rebuilding the columnstore index improves compression, removes deleted rows, and reorganizes rowgroups, enhancing scan performance.

Why this answer

Over time, columnstore indexes can become fragmented due to insert, update, and delete operations, leading to compressed row groups that are not optimally sized or have deleted records. Rebuilding the clustered columnstore index reorganizes the data into fully compressed row groups, removes deleted rows, and restores the high compression and segment elimination that columnstore indexes rely on for fast query performance.

Exam trap

The trap here is that candidates may assume performance degradation is always due to data skew or distribution choice, overlooking the common real-world issue of columnstore index fragmentation from ongoing DML operations.

How to eliminate wrong answers

Option A is wrong because adding a non-clustered index on frequently filtered columns would introduce additional index maintenance overhead and is unlikely to outperform the existing columnstore index for large fact tables; columnstore indexes already excel at scanning and filtering large datasets. Option B is wrong because changing the distribution column to one with higher cardinality does not address the root cause of performance degradation over time, which is index fragmentation, not data skew or distribution inefficiency. Option C is wrong because changing the distribution to round-robin would eliminate data locality for joins and aggregations, likely worsening query performance, and does not resolve the fragmentation issue.

34
Matchingmedium

Match each data transformation concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Handling flexible columns that change over time

Timestamp to track incremental data processing

Optimization to read only relevant partitions

Merge insert and update operations into a single action

Why these pairings

These concepts are important for data transformation in Azure.

35
Multi-Selecthard

Which THREE factors should you consider when designing a monitoring strategy for Azure Synapse Analytics dedicated SQL pool performance?

Select 3 answers
A.Use dynamic management views (DMVs) to identify long-running queries.
B.Implement workload classification for resource allocation.
C.Ensure data is evenly distributed across distributions.
D.Configure automatic index rebuild for columnstore indexes.
E.Set up alerts for DWU usage to enable dynamic scaling.
AnswersA, B, E

DMVs like sys.dm_pdw_exec_requests help monitor query performance.

Why this answer

Option A is correct because dynamic management views (DMVs) in Azure Synapse Analytics dedicated SQL pool, such as sys.dm_pdw_exec_requests and sys.dm_pdw_request_steps, provide real-time insight into query execution, allowing you to identify long-running queries, monitor resource consumption, and detect performance bottlenecks. This is a foundational monitoring practice for tuning workload performance.

Exam trap

The trap here is that candidates confuse design or maintenance actions (like data distribution or index rebuilds) with monitoring activities, leading them to select options that are valid optimization steps but not part of a monitoring strategy.

36
MCQmedium

You are monitoring an Azure Cosmos DB account using Azure Monitor. The 'Normalized RU Consumption' metric for a container is consistently above 90%. You need to ensure that the container can handle the load without throttling. What should you do?

A.Change the partition key to a different property.
B.Increase the provisioned throughput (RU/s) for the container.
C.Switch the account to serverless mode.
D.Modify the indexing policy to exclude unused paths.
AnswerB

Increasing RU/s provides more capacity, lowering the normalized RU consumption percentage for the same workload.

Why this answer

The 'Normalized RU Consumption' metric indicates the percentage of provisioned throughput (RU/s) being used. Consistently above 90% means the container is operating near its capacity limit, risking throttling (HTTP 429 errors) during traffic spikes. Increasing the provisioned throughput (RU/s) directly raises the capacity, allowing the container to handle the load without throttling.

Exam trap

The trap here is that candidates confuse optimizing RU consumption (e.g., indexing or partition key changes) with the need to increase capacity when the metric already shows the system is at its limit, leading them to choose options that reduce per-request cost rather than addressing the throughput ceiling.

How to eliminate wrong answers

Option A is wrong because changing the partition key does not increase the total throughput; it redistributes existing throughput across partitions, which may improve distribution but does not solve a capacity shortage. Option C is wrong because switching to serverless mode caps throughput at a lower maximum (typically 5,000 RU/s per container) and is intended for intermittent or low-traffic workloads, not for consistently high load. Option D is wrong because modifying the indexing policy to exclude unused paths reduces RU consumption per request, but with normalized RU already above 90%, the reduction is unlikely to bring consumption below the threshold and does not address the root cause of insufficient provisioned capacity.

37
MCQeasy

You are tuning an Azure Stream Analytics job that reads from an Event Hub and writes to an Azure Synapse Analytics table. The job's SU% utilization is consistently at 90%. Which action would most likely reduce the SU% utilization?

A.Decrease the Event Hub throughput units.
B.Partition the output table in Azure Synapse Analytics.
C.Use a reference data join to filter events.
D.Increase the number of streaming units (SU) allocated to the job.
AnswerD

More SUs provide additional compute resources, lowering the utilization percentage for the same workload.

Why this answer

Increasing the number of streaming units (SU) allocated to the job directly adds more compute resources, which reduces the SU% utilization by distributing the workload across more SUs. Since the job is consistently at 90% utilization, adding SUs lowers the per-SU load, preventing throttling and improving throughput. This is the standard scaling approach for Azure Stream Analytics when SU% is high.

Exam trap

The trap here is that candidates often confuse scaling the input source (Event Hub throughput units) or optimizing the output sink (partitioning) with directly addressing the compute bottleneck, but only increasing SUs reduces the compute utilization percentage.

How to eliminate wrong answers

Option A is wrong because decreasing Event Hub throughput units reduces the ingress capacity, which can cause backpressure and increase SU% utilization as the job struggles to keep up with incoming data. Option B is wrong because partitioning the output table in Azure Synapse Analytics improves write throughput but does not affect the compute load on the Stream Analytics job itself, so SU% utilization remains unchanged. Option C is wrong because using a reference data join to filter events adds additional processing overhead (lookups and state management), which would likely increase, not decrease, SU% utilization.

38
Multi-Selecthard

Which THREE metrics from Azure Monitor should be used to diagnose performance bottlenecks in an Azure Data Factory pipeline?

Select 3 answers
A.Pipeline Succeeded Rerun Count
B.Blob Capacity
C.Activity Duration
D.SQL Pool DWU Used
E.Data Integration Unit (DIU) Consumption
AnswersA, C, E

High rerun count indicates failures and potential bottlenecks.

Why this answer

Pipeline Succeeded Rerun Count (A) is correct because a high number of reruns indicates that the pipeline is repeatedly failing and retrying, which directly points to a performance bottleneck such as resource contention or throttling. This metric helps identify pipelines that are not completing successfully on the first attempt, signaling underlying issues that degrade throughput.

Exam trap

The trap here is that candidates often confuse storage-level metrics (like Blob Capacity) or data warehouse metrics (like DWU Used) with pipeline-specific performance indicators, but the question explicitly asks for metrics that diagnose bottlenecks in the pipeline execution itself, not in downstream storage or compute services.

Ready to test yourself?

Try a timed practice session using only Monitor Optimize questions.