Knowledge + Practice

Microsoft Azure Data Engineer Associate DP-203 (DP-203) — Questions 451–525

846 questions total · 12pages · All types, answers revealed

Take a mock exam Exam hub

Page 7 of 12

451

MCQhard

You have an Azure Data Factory pipeline that loads data from an on-premises SQL Server to an Azure Synapse Analytics dedicated SQL pool. The pipeline uses a staging Azure Blob Storage account. Recently, the pipeline has been failing with timeout errors. You need to ensure the pipeline completes successfully within the scheduled window. What should you do?

A.Increase the staging blob's copy timeout and enable parallel copy in the copy activity

B.Move the staging storage to Azure Data Lake Storage Gen2

C.Increase the dedicated SQL pool's data warehouse units (DWU)

D.Use PolyBase instead of staging for data loading

AnswerA

Increasing timeout allows more time for large data transfers, and parallel copy improves throughput.

Why this answer

The timeout errors occur because the default copy timeout for staging blob storage is insufficient for large data volumes or slow network conditions. Increasing the staging blob's copy timeout and enabling parallel copy in the copy activity directly addresses this by allowing more time for data transfer and leveraging multiple concurrent connections to improve throughput, ensuring the pipeline completes within the scheduled window.

Exam trap

The trap here is that candidates may confuse scaling the destination (Synapse DWUs) or changing storage tiers with fixing the root cause of timeout errors, which is often a configuration issue in the copy activity's staging settings rather than a capacity problem.

How to eliminate wrong answers

Option B is wrong because moving to Azure Data Lake Storage Gen2 does not inherently resolve timeout issues; it provides hierarchical namespace and POSIX permissions but does not change the copy timeout or parallelism settings. Option C is wrong because increasing DWUs scales the dedicated SQL pool's compute resources for query performance, not for data ingestion timeout issues during staging. Option D is wrong because PolyBase is an alternative loading method that bypasses staging but does not address timeout errors in the existing staging-based pipeline; it would require architectural changes and may introduce different performance characteristics.

Full explanation →

452

MCQmedium

You are designing a data processing solution for a financial services company. The solution must process sensitive customer data from multiple sources. You need to ensure that the data is encrypted at rest and in transit, and that access to the data is audited. Which combination of Azure services should you use?

A.Azure Data Lake Storage (encrypted at rest), Azure Synapse Analytics (TDE and SSL), and Microsoft Purview

B.Azure Blob Storage (encrypted at rest), Azure HDInsight, and Azure Log Analytics

C.Azure SQL Database (TDE and SSL), Azure Analysis Services, and Microsoft Purview

D.Azure Data Lake Storage (encrypted at rest), Microsoft Fabric, and Azure Monitor

AnswerA

Azure Storage provides encryption at rest; Synapse supports TDE for at-rest and SSL for in-transit; Purview provides data lineage and auditing.

Why this answer

Option A is correct because Azure Data Lake Storage provides encryption at rest using Azure Storage Service Encryption (SSE) with 256-bit AES, and Azure Synapse Analytics supports Transparent Data Encryption (TDE) for at-rest encryption and SSL/TLS for in-transit encryption. Microsoft Purview enables data governance and auditing by capturing lineage, classification, and access activity logs, meeting the compliance requirements for sensitive financial data.

Exam trap

The trap here is that candidates often confuse Azure Monitor or Log Analytics with Microsoft Purview for auditing, but Microsoft Purview is the dedicated service for data governance, classification, and access auditing, while Azure Monitor is for operational monitoring and does not provide data-level audit trails.

How to eliminate wrong answers

Option B is wrong because Azure HDInsight does not natively enforce encryption at rest for data stored in its managed disks or external storage without additional configuration, and Azure Log Analytics focuses on monitoring and diagnostics rather than auditing data access at the granularity required for sensitive customer data. Option C is wrong because Azure Analysis Services does not provide built-in auditing of data access at the storage level; it is an analytical engine that relies on underlying data sources for encryption, and Microsoft Purview is correctly included but the combination lacks a scalable storage layer for multiple sources. Option D is wrong because Microsoft Fabric is a unified analytics platform that does not inherently provide the same level of granular access auditing as Microsoft Purview, and Azure Monitor is designed for infrastructure monitoring, not data access auditing.

Full explanation →

453

Multi-Selectmedium

Which TWO Azure services can be used to implement a data lake architecture for storing structured, semi-structured, and unstructured data?

Select 2 answers

A.Azure Blob Storage with hierarchical namespace

B.Azure SQL Database

C.Azure Synapse Analytics

D.Azure Cosmos DB

E.Azure Data Lake Storage Gen2

AnswersA, E

Can function as a data lake.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is built on top of Azure Blob Storage and adds a hierarchical namespace, enabling a true data lake architecture. It supports storing structured, semi-structured, and unstructured data at scale, with POSIX-like access control lists and directory-level operations. Azure Blob Storage with hierarchical namespace enabled is essentially the same service as ADLS Gen2, providing the same data lake capabilities.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics as a storage service for data lakes, when in fact it is a query and analytics engine that relies on ADLS Gen2 or other storage for the underlying data.

Full explanation →

454

MCQmedium

You are designing a data processing solution using Azure Synapse Analytics serverless SQL pool. The solution must query data stored in Parquet files in Azure Data Lake Storage Gen2. The queries are ad-hoc and vary greatly. Which feature should you use to optimize query performance for frequently accessed data partitions?

A.Implement workload management to prioritize queries.

B.Use OPENROWSET with explicit file path filtering.

C.Enable result-set caching on the serverless SQL pool.

D.Create materialized views on the Parquet files.

AnswerB

OPENROWSET with path filtering prunes partitions and improves performance.

Why this answer

Option C is correct because the OPENROWSET function with file path filtering can prune partitions by specifying the path to specific folders. Option A is wrong because materialized views are not supported in serverless SQL pool. Option B is wrong because result-set caching is for dedicated SQL pool.

Option D is wrong because workload management is for dedicated SQL pool.

Full explanation →

455

MCQmedium

You are designing a data lake on Azure Data Lake Storage Gen2. The data includes customer PII that must be encrypted at rest using customer-managed keys. Which feature should you enable?

A.Enable double encryption for Azure Storage

B.Configure Azure Storage encryption with customer-managed keys in Azure Key Vault

C.Enable infrastructure encryption

D.Use Azure Customer Lockbox for access control

AnswerB

C is correct because customer-managed keys in Key Vault allow you to control encryption keys.

Why this answer

Option B is correct because Azure Storage encryption with customer-managed keys (CMK) in Azure Key Vault allows you to control the encryption keys used to encrypt data at rest in Azure Data Lake Storage Gen2. This meets the requirement for encrypting PII with customer-managed keys, as CMK provides an extra layer of security by letting you manage key rotation, revocation, and access policies.

Exam trap

The trap here is that candidates confuse 'double encryption' or 'infrastructure encryption' with customer-managed key control, but only CMK gives you direct ownership of the encryption keys for data at rest.

How to eliminate wrong answers

Option A is wrong because double encryption for Azure Storage encrypts data twice (once at the service level and once at the infrastructure level) but does not inherently use customer-managed keys; it can use Microsoft-managed keys. Option C is wrong because infrastructure encryption uses platform-managed keys to encrypt the storage infrastructure, not customer-managed keys for the data itself. Option D is wrong because Azure Customer Lockbox provides access control for Microsoft support engineers to access your data, not encryption at rest with customer-managed keys.

Full explanation →

456

MCQmedium

You are monitoring an Azure Synapse Analytics dedicated SQL pool and notice that some queries are taking longer than expected. You need to identify queries that are experiencing significant memory pressure. Which dynamic management view (DMV) should you query?

A.sys.dm_pdw_exec_requests

B.sys.dm_pdw_wait_stats

C.sys.dm_pdw_query_stats_xe

D.sys.dm_pdw_nodes_os_performance_counters

AnswerA

This DMV includes memory_grant and memory_used columns to assess memory pressure.

Why this answer

Option C is correct because sys.dm_pdw_exec_requests shows memory grants for queries. Option A is wrong because sys.dm_pdw_nodes_os_performance_counters shows OS-level counters. Option B is wrong because sys.dm_pdw_wait_stats shows wait statistics.

Option D is wrong because sys.dm_pdw_query_stats_xe shows extended events.

Full explanation →

457

MCQeasy

You need to design a data storage solution for a batch processing pipeline that processes petabytes of data daily. The data is stored in Parquet format and must be accessible by both Azure Databricks and Azure Synapse Analytics. Which storage solution should you recommend?

A.Azure Data Lake Storage Gen2

B.Azure Files

C.Azure SQL Database

D.Azure Blob Storage

AnswerA

Provides hierarchical namespace, high throughput, and is compatible with Databricks and Synapse.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct choice because it combines a hierarchical namespace with Azure Blob Storage's scalable object storage, providing native POSIX-like access control and high throughput for petabyte-scale batch processing. Both Azure Databricks and Azure Synapse Analytics have optimized connectors for ADLS Gen2 that leverage the hierarchical namespace for efficient partition pruning and file listing, which is critical for Parquet-based analytics at this scale.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage (flat namespace) with ADLS Gen2 (hierarchical namespace), assuming both are equivalent for big data analytics, but the hierarchical namespace is a critical differentiator for performance at petabyte scale in batch pipelines.

How to eliminate wrong answers

Option B (Azure Files) is wrong because it is designed for SMB-based file sharing with low-latency access for small-to-medium workloads, not for petabyte-scale batch analytics, and lacks the throughput and native integration with Spark and Synapse pipelines required for Parquet data. Option C (Azure SQL Database) is wrong because it is a relational OLTP store optimized for transactional queries and small row-based operations, not for storing and processing petabytes of Parquet files in a batch pipeline. Option D (Azure Blob Storage) is wrong because while it can store Parquet files, it lacks a hierarchical namespace, which forces Azure Databricks and Synapse to use slower flat namespace listing operations (e.g., ListBlobs) that degrade performance at petabyte scale compared to ADLS Gen2's directory-aware operations.

Full explanation →

458

MCQhard

Your Azure Synapse Analytics pipeline uses PolyBase to load data from Azure Blob Storage into a dedicated SQL pool. The load is slow and suffers from high latency. Which optimization should you apply first?

A.Split the source files into smaller chunks.

B.Use a round-robin distribution for the staging table.

C.Increase the DWU (Data Warehouse Units) of the SQL pool.

D.Create clustered columnstore indexes on the staging table.

AnswerB

Round-robin distribution minimizes data movement during PolyBase loads.

Why this answer

Option C is correct because using a round-robin distribution for staging tables avoids data movement during load. Option A is wrong because increasing DWU may help but is not the first optimization. Option B is wrong because file splitting can improve parallelism but may not address latency.

Option D is wrong because columnstore indexes are for read performance, not load speed.

Full explanation →

459

MCQhard

You are designing a data ingestion pipeline for Azure Data Lake Storage Gen2 using Azure Databricks. The source is an on-premises SQL Server database with incremental changes captured via change data capture (CDC). The requirement is to ensure exactly-once semantics for each row while minimizing latency. Which approach should you recommend?

A.Use Azure Data Factory with a tumbling window trigger to copy data every 5 minutes.

B.Use PolyBase to create external tables and run T-SQL MERGE statements.

C.Use Azure Databricks Auto Loader with COPY INTO command.

D.Use Spark Structured Streaming in Azure Databricks to read CDC changes and write to Delta Lake.

AnswerD

Structured Streaming with Delta Lake ensures exactly-once and low latency.

Why this answer

Option B is correct because Spark Structured Streaming with Delta Lake provides exactly-once semantics via transaction logs and checkpoints, and is designed for low-latency streaming. Option A is wrong because Azure Data Factory triggers are batch-oriented and not suitable for streaming low-latency. Option C is wrong because COPY INTO is for batch loads, not streaming.

Option D is wrong because PolyBase is for bulk load, not streaming.

Full explanation →

460

MCQeasy

Your company stores sensitive customer data in Azure Data Lake Storage Gen2. You need to implement a security solution that prevents unauthorized access from the public internet while allowing access from specific Azure services and on-premises networks. Which feature should you configure?

A.Use private endpoints for all storage accounts.

B.Enable Microsoft Entra ID authentication only.

C.Configure firewall rules and virtual network service endpoints.

D.Use shared access signatures (SAS) with IP restrictions.

AnswerC

Firewall rules and service endpoints restrict access from public internet while allowing trusted services and VNets.

Why this answer

Option C is correct because configuring firewall rules and virtual network service endpoints allows you to restrict access to your Azure Data Lake Storage Gen2 account to only traffic originating from specific virtual networks and trusted Azure services, while blocking all public internet traffic. This provides a network-level security boundary that meets the requirement of preventing unauthorized public access while permitting access from on-premises networks (via VPN/ExpressRoute) and specific Azure services.

Exam trap

The trap here is that candidates often confuse network-level access controls (firewall and service endpoints) with identity-based controls (Entra ID) or delegation mechanisms (SAS), mistakenly thinking authentication alone can block public internet traffic.

How to eliminate wrong answers

Option A is wrong because private endpoints use Azure Private Link to assign a private IP from a virtual network to the storage account, but they do not inherently block public internet access; you must also disable public network access separately, and they do not directly allow access from on-premises networks without additional VPN/ExpressRoute configuration. Option B is wrong because enabling Microsoft Entra ID authentication only controls identity-based access (who can authenticate), not network-level access (where traffic can originate from); it does not prevent public internet traffic from reaching the storage endpoint. Option D is wrong because shared access signatures (SAS) with IP restrictions provide time-limited, delegated access with source IP constraints, but they are designed for granular per-request authorization, not as a permanent network security boundary to block all public internet traffic while allowing Azure services and on-premises networks.

Full explanation →

461

Matchingmedium

Match each storage redundancy option to its description in Azure Storage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Three synchronous copies within a single data center

Three copies across multiple availability zones in a region

Geo-redundant storage with read access in secondary region

Geo-zone-redundant storage with read access in secondary region

Why these pairings

These redundancy options are key for data durability and availability.

Full explanation →

462

MCQmedium

You are deploying an Azure Synapse workspace using an ARM template. The template includes a Managed integration runtime with 'AutoResolve' location and a TTL of 10 minutes for data flows. After deployment, you notice that the first data flow execution takes a long time to start. What is the most likely cause?

A.The core count of 8 is insufficient for the data flow.

B.The TTL setting is too low, causing the cluster to be recreated frequently.

C.The AutoResolve location cannot be used for Managed IR.

D.The integration runtime type should be 'Self-Hosted' for data flows.

AnswerB

Low TTL leads to frequent cluster teardown and startup delays.

Why this answer

Option B is correct because the TTL (time-to-live) of 10 minutes means the cluster will be spun down after 10 minutes of inactivity, so the first execution after a period of inactivity must wait for the cluster to spin up. Option A is wrong because the location is AutoResolve and works. Option C is wrong because the core count is sufficient.

Option D is wrong because the template deploys the IR correctly.

Full explanation →

463

MCQhard

Refer to the exhibit. You have an Azure Data Factory pipeline that performs an incremental load from an Azure SQL Database source to a target Azure SQL Database. The pipeline uses a watermark column approach. After running the pipeline, you notice that the target table is empty. What is the most likely cause of this issue?

A.The dependency condition should be 'Completed' instead of 'Succeeded'.

B.The WatermarkQuery activity failed, causing the CopyData activity to be skipped.

C.The watermark query returns the maximum LastModified value, but the copy query uses the same value to filter, resulting in zero rows.

D.The CopyData activity runs before the WatermarkQuery activity completes.

AnswerC

The copy query filters for rows where LastModified > NewWatermark, but NewWatermark is the maximum, so no rows satisfy the condition. The previous watermark should be stored and used.

Why this answer

Option D is correct because the query uses the output of WatermarkQuery (which returns the maximum LastModified) as the filter condition, but it should filter for rows where LastModified is greater than the previous watermark, not the current maximum. This results in no rows being selected because no rows have LastModified > maximum. Option A is wrong because the pipeline succeeded.

Option B is wrong because the watermark query executes before the copy activity. Option C is wrong because the dependency condition is correct.

Full explanation →

464

Multi-Selectmedium

Which TWO factors should you consider when choosing between Azure SQL Database and Azure SQL Managed Instance for migrating a legacy application? (Choose two.)

Select 2 answers

A.Support for active geo-replication

B.Authentication using Microsoft Entra ID

C.Need for SQL Server Agent jobs

D.Requirement for cross-database queries

E.Integration with Azure VNet

AnswersC, D

A is correct because SQL Database does not have SQL Agent; Managed Instance does.

Why this answer

Option C is correct because SQL Managed Instance includes full support for SQL Server Agent, which is essential for legacy applications that rely on scheduled jobs, alerts, or automation. Azure SQL Database does not support SQL Server Agent, making Managed Instance the appropriate choice when this feature is required.

Exam trap

The trap here is that candidates often assume VNet integration (Option E) is exclusive to Managed Instance, but Azure SQL Database also supports VNet integration via private endpoints, making it a non-differentiating factor for this migration decision.

Full explanation →

465

Multi-Selectmedium

Which TWO actions should you take to secure data in Azure Synapse Analytics dedicated SQL pool? (Choose two.)

Select 2 answers

A.Use PolyBase to load data from external sources.

B.Enable result-set caching for query performance.

C.Configure workload classification for resource governance.

D.Apply dynamic data masking (DDM) to obfuscate sensitive data.

E.Implement row-level security (RLS) to restrict data access.

AnswersD, E

DDM hides sensitive data from non-privileged users.

Why this answer

Options A and D are correct. Row-level security restricts data access, and dynamic data masking obfuscates sensitive data. Option B is wrong because PolyBase is for data movement, not security.

Option C is wrong because workload management is for performance, not security. Option E is wrong because result-set caching is for performance.

Full explanation →

466

MCQeasy

You need to store data that is rarely accessed but must be retained for 10 years for compliance. The data will be accessed occasionally for audits. Which Azure storage tier is the most cost-effective?

A.Cool storage tier

B.Hot storage tier

C.Archive storage tier

D.Premium storage tier

AnswerA

Cool tier is designed for infrequently accessed data with lower storage cost.

Why this answer

The Cool storage tier is designed for data that is infrequently accessed and stored for at least 30 days, making it a cost-effective choice for compliance data accessed occasionally over a 10-year period. It offers lower storage costs than Hot tier while still providing low-latency access for audits, unlike Archive which requires hours to rehydrate.

Exam trap

The trap here is that candidates often choose Archive tier for long-term retention without considering the access latency requirement for audits, assuming 'rarely accessed' means never needing quick access.

How to eliminate wrong answers

Option B (Hot storage tier) is wrong because it is optimized for frequent access and has higher storage costs, making it unnecessarily expensive for rarely accessed compliance data. Option C (Archive storage tier) is wrong because although it has the lowest storage cost, it requires a rehydration process that can take up to 15 hours to access data, which is impractical for occasional audit access. Option D (Premium storage tier) is wrong because it is designed for high-performance, low-latency workloads (e.g., VMs, databases) and has the highest cost, making it unsuitable for rarely accessed archival data.

Full explanation →

467

MCQhard

Your Azure Synapse Analytics workspace uses serverless SQL pools for ad-hoc querying. Users report that queries are slow. You examine the execution plan and see that the query scans multiple partitions in the openrowset. What is the best way to improve performance?

A.Increase the MAXDOP setting

B.Create materialized views on the external tables

C.Partition the underlying data by a frequently filtered column

D.Add a WHERE clause on the partition column

AnswerD

Filtering on partition column enables partition elimination, reducing data scanned.

Why this answer

Serverless SQL pools rely on file pruning. Partition elimination is achieved by filtering on partitioned columns in the query. Option A is wrong because materialized views are for dedicated pools.

Option C is wrong because increasing MAXDOP may not help pruning. Option D is wrong because partitioning the file set helps but requires reorganizing data.

Full explanation →

468

MCQeasy

You need to monitor resource utilization for an Azure Synapse Analytics dedicated SQL pool. Which Azure Monitor metric shows the percentage of allocated DWU being used?

A.Memory percentage

B.Data IO percentage

C.CPU percentage

D.DWU used

AnswerD

This metric shows the percentage of allocated DWU being consumed.

Why this answer

Option C is correct because 'DWU used' metric shows the percentage of allocated DWU consumed. Option A is wrong because 'CPU percentage' is not a standard metric for Synapse. Option B is wrong because 'Data IO percentage' is not the primary metric.

Option D is wrong because 'Memory percentage' is not directly DWU.

Full explanation →

469

MCQeasy

You are using Azure Stream Analytics to process real-time data from an event hub and output to Azure Synapse Analytics. You need to ensure exactly-once delivery semantics to the output. What should you configure?

A.Set the input to 'Exactly Once' consumption mode.

B.Configure event ordering and late arrival policies.

C.Enable checkpointing in the query.

D.Set the output to 'Exactly Once' delivery mode.

AnswerD

Correct: Azure Stream Analytics provides exactly-once semantics when configured on the output.

Why this answer

Option A is correct because Azure Stream Analytics supports exactly-once delivery to Azure Synapse Analytics by enabling 'Exactly Once' output mode. Option B is wrong because checkpointing is for state management. Option C is wrong because event ordering does not guarantee exactly-once.

Option D is wrong because the event hub is the input, not output.

Full explanation →

470

MCQmedium

You are designing a data processing pipeline in Azure Synapse Analytics that ingests streaming data from Azure Event Hubs and stores it in a dedicated SQL pool. The data must be available for querying within 5 minutes of ingestion. Which processing approach should you recommend?

A.Use Azure Data Factory with a tumbling window trigger set to 5 minutes.

B.Use Azure Stream Analytics with a dedicated SQL pool output and configure a 1-minute window.

C.Use PolyBase to load data from Event Hubs into the dedicated SQL pool every 5 minutes.

D.Use Spark Structured Streaming in Azure Synapse to write micro-batches every 5 minutes.

AnswerB

Stream Analytics provides sub-minute latency and is designed for real-time ingestion into Synapse dedicated SQL pool.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing and can output directly to a dedicated SQL pool. By configuring a 1-minute window, you ensure data is materialized in the SQL pool well within the 5-minute SLA, meeting the latency requirement with headroom.

Exam trap

The trap here is that candidates confuse batch-oriented tools (Data Factory, PolyBase) or general-purpose streaming frameworks (Spark Structured Streaming) with the dedicated, low-latency stream processing service (Stream Analytics) that is optimized for sub-minute latency to Synapse SQL pools.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory with a tumbling window trigger is a batch-oriented orchestration tool, not a streaming engine; it cannot process Event Hubs data in near real-time and introduces at least 5 minutes of latency before the trigger even fires. Option C is wrong because PolyBase is a bulk-load technology for reading external data sources like Azure Blob Storage or Data Lake, not for ingesting streaming data from Event Hubs; it cannot connect to Event Hubs directly. Option D is wrong because Spark Structured Streaming in Synapse writes micro-batches every 5 minutes, which meets the 5-minute SLA only at the boundary; any processing delay or checkpoint overhead could push latency beyond the requirement, and it lacks the native, low-latency integration with dedicated SQL pool that Stream Analytics provides.

Full explanation →

471

MCQmedium

You are monitoring an Azure Synapse Analytics dedicated SQL pool and notice that some queries are experiencing high wait times due to concurrency slots being exhausted. You need to optimize the workload to reduce contention. Which three actions should you take? (Select three.)

A.Increase the data warehouse service level (DWU).

B.Create workload groups with different importance levels.

C.Configure workload isolation to limit the amount of resources a workload group can use.

D.Use workload classification to assign queries to appropriate workload groups.

E.Enable result-set caching for frequently executed queries.

AnswerB, C, D

Different importance levels allow critical queries to get priority access to concurrency slots.

Why this answer

Options A, C, and D are correct. Classifying queries into workload groups and assigning importance helps prioritize critical queries. Using workload isolation ensures that resource-intensive queries do not block others.

Setting a minimum percentage of resources for small queries ensures they are not starved. Option B is wrong because increasing service level (e.g., DWU) increases concurrency slots but may increase cost. Option E is wrong because result-set caching does not affect concurrency slot usage.

Full explanation →

472

Multi-Selectmedium

You have an Azure Data Lake Storage Gen2 account that stores sensitive customer data. You need to prevent data exfiltration to unauthorized external IP addresses. Which TWO actions should you take?

Select 2 answers

A.Use private endpoints for the storage account

B.Enable Azure Firewall on the storage account

C.Use shared access signatures (SAS) with limited permissions

D.Configure storage firewall to allow only specific virtual networks

E.Enable geo-redundant storage (GRS)

AnswersA, D

Access over private IP, preventing exposure to public internet.

Why this answer

Network security controls: enabling firewall and deny access from internet, plus configuring service endpoints or private endpoints. Option A (Azure Firewall) is not a storage setting. Option D (Geo-redundant storage) is for durability, not security.

Option E (Shared access signatures) is for fine-grained access but not prevent exfiltration.

Full explanation →

473

MCQmedium

Your team is troubleshooting slow query performance on a dedicated SQL pool in Azure Synapse Analytics. The query uses a hash-distributed fact table with 60 distributions. After reviewing the execution plan, you notice a high number of data moves. Which action would most likely reduce data movement?

A.Change the distribution type to round-robin.

B.Update statistics on all columns used in joins.

C.Increase the number of distributions to 120.

D.Redistribute the fact table on the join column using hash distribution.

AnswerD

Hash distribution on the join column keeps related rows together, reducing data shuffling.

Why this answer

Option C is correct because aligning the distribution key of the fact table with the join column in a hash-distributed table keeps matching rows on the same distribution, minimizing data movement. Option A is wrong because changing to round-robin can increase data movement for joins. Option B is wrong because increasing distributions requires rebuilding the table and does not guarantee reduced movement.

Option D is wrong because statistics help the optimizer but do not directly reduce movement.

Full explanation →

474

Multi-Selecthard

Which THREE of the following are best practices for optimizing performance of Delta Lake tables in Azure Synapse Analytics? (Choose three.)

Select 4 answers

A.Run the OPTIMIZE command to compact small files and improve read performance.

B.Periodically run VACUUM to remove old versions of files that are no longer needed.

C.Partition the table on columns that are used in WHERE clauses to enable partition pruning.

D.Partition on high-cardinality columns like UserID to maximize parallelism.

E.Use Z-order on columns that are frequently used in filter predicates.

AnswersA, B, C, E

OPTIMIZE reduces the number of small files, improving I/O.

Why this answer

Option A is correct because the OPTIMIZE command in Delta Lake compacts small files into larger ones, reducing the number of files that need to be read during queries. This improves read performance by minimizing file listing overhead and enabling more efficient I/O, especially in scenarios with many small writes.

Exam trap

The trap here is that candidates often confuse high-cardinality partitioning with parallelism, not realizing that excessive partitions cause metadata bloat and slow down queries, while Z-order is a complementary technique for non-partition columns.

Full explanation →

475

MCQmedium

You are running a Spark notebook in Azure Synapse Analytics that reads from a Delta table and writes to a Parquet file. The job fails with the error: 'AnalysisException: Table or view not found: bronze.sales'. The table exists in the lakehouse. What is the most likely cause?

A.The user does not have read permission on the table.

B.The table is not registered in the Spark metastore; it is only in the lakehouse.

C.The Parquet file format is incompatible with the Delta source.

D.The Delta table is corrupted.

AnswerB

In Synapse, lakehouse tables are stored in a separate catalog; the Spark session's default catalog may not include it, so the table is not found.

Why this answer

In Azure Synapse Analytics, a Spark notebook uses its own Spark metastore to resolve table references. If the table 'bronze.sales' exists only in the lakehouse (i.e., as a Delta table in the underlying storage) but is not registered in the Spark metastore, the Spark engine cannot find it and throws an AnalysisException. The error indicates a metadata resolution failure, not a permission or data corruption issue.

Exam trap

Microsoft often tests the misconception that a table existing in the lakehouse automatically makes it visible to Spark notebooks, when in fact the Spark metastore and lakehouse catalog are separate metadata layers that must be explicitly synchronized.

How to eliminate wrong answers

Option A is wrong because an AnalysisException for 'Table or view not found' is a metadata resolution error, not a permission error; a lack of read permission would typically produce a SecurityException or AccessDeniedException. Option C is wrong because Parquet and Delta are both columnar formats based on Parquet, and writing to Parquet from a Delta source does not cause incompatibility; the error occurs before any read/write operation begins. Option D is wrong because a corrupted Delta table would cause read failures (e.g., file not found, checksum mismatch) during data access, not a table-not-found error at the metadata level.

Full explanation →

476

Multi-Selectmedium

You are designing a data processing solution using Azure Databricks. You need to optimize costs while maintaining performance. Which TWO strategies should you implement? (Choose two.)

Select 2 answers

A.Use spot instances for non-critical, fault-tolerant workloads.

B.Enable autoscaling to automatically adjust the number of workers based on workload.

C.Always choose the largest VM sizes to minimize cluster count.

D.Use all-purpose clusters for all workloads to avoid startup delays.

E.Use the serverless pricing tier to avoid paying for idle resources.

AnswersA, B

Correct: Spot instances offer significant discounts but can be preempted; suitable for resilient jobs.

Why this answer

Options B and D are correct. B: Using spot instances reduces costs for fault-tolerant workloads. D: Autoscaling adjusts resources based on workload.

Option A is wrong because all-purpose clusters are more expensive and not recommended for production jobs. Option C is wrong because using larger VMs may lead to underutilization and higher costs. Option E is wrong because Azure Databricks is not serverless in the traditional sense; you pay for running VMs.

Full explanation →

477

MCQmedium

You are an Azure administrator. You apply the Azure Policy shown in the exhibit to a management group. What is the outcome of this policy?

A.It allows storage accounts only if they have a firewall rule.

B.It denies storage accounts that allow public network access.

C.It requires all storage accounts to use HTTPS only.

D.It denies the creation of any new storage account.

AnswerB

The policy denies when defaultAction equals Allow, meaning public access is allowed.

Why this answer

Option D is correct because the policy denies storage accounts where the default network access is set to 'Allow' (i.e., public access). Option A (denies creation only) is not true; it applies to existing accounts as well via auditing/deny. Option B (allows only HTTPS) is unrelated.

Option C (requires firewall) is not stated.

Full explanation →

478

MCQhard

You are a data engineer for a global e-commerce company. The company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. The environment includes a large fact table 'Sales' distributed by hash on 'CustomerID', and dimension tables 'Customer' (hash-distributed on 'CustomerID') and 'Product' (replicated). Recently, queries that join Sales and Customer are performing poorly. You run a query to check data skew on the Sales table and find that one distribution has 40% more rows than the average. Additionally, the Customer table has high data movement during joins. You need to optimize the performance of these joins. What should you do?

A.Change the distribution of the Customer table to replicated.

B.Increase the data warehouse performance level (DWU) to allocate more resources.

C.Change the distribution of the Sales table to round-robin.

D.Change the distribution key of the Sales table to 'ProductID' to align with the Product table.

AnswerA

Replicated tables avoid data movement for joins.

Why this answer

Option B is correct because changing the Customer table to replicated distribution eliminates data movement during joins. Option A is wrong because changing the distribution key of Sales to a different column may not solve skew if the key is not the join column. Option C is wrong because round-robin is not suitable for fact tables in star joins.

Option D is wrong because increasing DWU may alleviate symptoms but does not fix the root cause of data movement.

Full explanation →

479

MCQhard

You need to assign permissions to a service principal so that it can write data to a specific container in Azure Data Lake Storage Gen2, but not delete blobs. The above JSON shows the built-in role 'Storage Blob Data Contributor'. The role includes delete permission in DataActions. What should you do?

A.Create a custom role that includes read and write DataActions but excludes the delete DataAction, then assign that custom role.

B.Assign the Storage Blob Data Contributor role and create a deny assignment that denies delete.

C.Assign the Storage Blob Data Contributor role and use ACLs to deny delete on the container.

D.Assign the Storage Blob Data Contributor role and remove the delete permission at the role assignment scope.

AnswerA

Custom roles allow fine-grained control over permissions.

Why this answer

Option A is correct because Azure RBAC roles are all-or-nothing at the permission level; you cannot selectively remove a single DataAction from a built-in role at assignment time. The only way to grant write access without delete is to create a custom role that explicitly includes the required read and write DataActions (e.g., Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write) and omits the delete DataAction (Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete). This custom role is then assigned to the service principal at the container scope, ensuring it can write data but never delete blobs.

Exam trap

The trap here is that candidates mistakenly believe you can modify a built-in role's permissions at assignment time (Option D) or that ACLs can override RBAC permissions (Option C), when in reality Azure requires a custom role for such granular control.

How to eliminate wrong answers

Option B is wrong because deny assignments cannot be used to selectively block a specific DataAction within a role assignment; deny assignments are designed to block all assignments of a role at a higher scope, not to carve out individual permissions. Option C is wrong because ACLs in Azure Data Lake Storage Gen2 are applied to the data plane for user/group identities, but they cannot override an RBAC role assignment that explicitly grants delete permission; RBAC takes precedence over ACLs when both are present. Option D is wrong because built-in roles like Storage Blob Data Contributor have fixed DataActions that cannot be modified at the role assignment scope; you cannot 'remove' a permission from a built-in role during assignment.

Full explanation →

480

MCQmedium

You are designing an Azure Data Factory pipeline to ingest data from an on-premises SQL Server into Azure Synapse Analytics. The data must be encrypted in transit. Which integration runtime type should you use and what additional configuration is required?

A.Use Azure Integration Runtime with ExpressRoute

B.Use Self-hosted Integration Runtime with Azure VPN Gateway

C.Use Self-hosted Integration Runtime with a certificate for HTTPS

D.Use Azure Integration Runtime with a public endpoint

AnswerC

Self-hosted IR with certificate encrypts data in transit.

Why this answer

A self-hosted integration runtime is needed to connect to on-premises networks. To encrypt data in transit, the certificate must be used for HTTPS encryption. Option A is wrong because Azure IR cannot access on-premises directly.

Option C is wrong because ExpressRoute provides private connectivity but does not handle encryption at the application layer. Option D is wrong because VPN Gateway is a network-level solution; ADF still needs self-hosted IR.

Full explanation →

481

MCQeasy

You are configuring Azure Data Lake Storage Gen2 for a new data lake. You need to ensure that all data written to the 'raw' container is automatically encrypted at rest. Which feature should you enable?

A.Azure Key Vault

B.Azure Disk Encryption

C.Azure Storage Service Encryption (SSE)

D.Azure Purview

AnswerC

SSE is automatically enabled and encrypts data at rest.

Why this answer

Azure Storage Service Encryption (SSE) is enabled by default for all Azure Storage accounts, including Data Lake Storage Gen2, and encrypts data at rest using AES-256. Option A is wrong because Azure Disk Encryption is for VMs. Option B is wrong because Azure Key Vault is used for managing keys, not the encryption itself.

Option D is wrong because Azure Purview is for data governance.

Full explanation →

482

MCQeasy

You have an Azure Databricks notebook that processes data from a Delta table. The notebook runs slowly due to many small files. You need to optimize the Delta table for faster reads. Which Delta Lake operation should you run?

A.Run CONVERT TO DELTA on the underlying Parquet files.

B.Run OPTIMIZE to compact small files.

C.Run DESCRIBE HISTORY to analyze file sizes.

D.Run VACUUM to delete old files.

AnswerB

OPTIMIZE compacts small files into larger ones, improving read performance.

Why this answer

Option C is correct because OPTIMIZE compacts small files into larger ones, improving read performance. Option A (VACUUM) removes old files but does not compact. Option B (DESCRIBE HISTORY) shows history.

Option D (CONVERT TO DELTA) converts Parquet but does not optimize.

Full explanation →

483

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The sales table is partitioned by month and has a clustered columnstore index. Over time, the performance of queries filtering on a specific month has degraded. The data engineer suspects high rowgroup elimination. Which action should be taken to improve performance?

A.Change the table distribution to hash-distributed on the partition key.

B.Reorganize or rebuild the columnstore index on the table.

C.Drop and recreate the affected partitions.

D.Update statistics on the partitioned column.

AnswerB

Reorganizing or rebuilding consolidates small rowgroups, improving partition elimination.

Why this answer

Option B is correct because reorganizing or rebuilding the columnstore index compresses fragmented rowgroups and merges small rowgroups into optimal sizes (typically 102,400 rows per rowgroup). This directly addresses the degraded rowgroup elimination: when rowgroups are too small or fragmented, the engine cannot efficiently skip entire rowgroups during partition-level scans, causing more I/O and slower performance.

Exam trap

The trap here is that candidates confuse rowgroup elimination (a columnstore physical storage concept) with partition elimination (a table design concept), and incorrectly choose partition-related actions like dropping partitions or updating statistics instead of addressing the columnstore index fragmentation directly.

How to eliminate wrong answers

Option A is wrong because changing the distribution to hash-distributed on the partition key does not fix rowgroup fragmentation; distribution affects data movement across distributions, not the internal rowgroup structure of columnstore indexes. Option C is wrong because dropping and recreating affected partitions is an overly aggressive operation that drops data and requires reloading; it does not target the root cause of fragmented rowgroups within the columnstore index. Option D is wrong because updating statistics on the partitioned column improves cardinality estimates for the query optimizer but does not repair the physical rowgroup layout that causes poor rowgroup elimination.

Full explanation →

484

MCQhard

You have an Azure Data Factory pipeline that loads data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline fails intermittently with network connectivity errors. You need to ensure reliable data transfer with minimal latency. Which solution should you recommend?

A.Set up a site-to-site VPN gateway

B.Deploy a self-hosted IR with high availability on two nodes

C.Stage data in Azure Blob Storage using sharded files

D.Use an Azure Integration Runtime instead

AnswerB

High availability with multiple nodes ensures failover and load balancing.

Why this answer

A self-hosted integration runtime (SHIR) is required for on-premises data sources. To improve reliability, a high-availability SHIR with two or more nodes provides redundancy and load balancing. Option A is wrong because Azure IR cannot access on-premises networks.

Option B is wrong because Sharded staging doesn't address connectivity. Option D is wrong because VPN Gateway alone doesn't improve the IR reliability.

Full explanation →

485

MCQeasy

You are monitoring an Azure Data Factory pipeline that runs hourly. You notice that the pipeline has been failing intermittently with an error indicating 'Activity timeout'. Which Azure Monitor metric should you set an alert on to proactively detect such failures?

A.Integration runtime queue depth metric

B.Pipeline duration metric

C.Data read and data written metrics

D.Failed pipeline runs metric

AnswerD

This metric increments each time a pipeline run fails, allowing proactive alerting.

Why this answer

Option A is correct because 'Failed runs' metric directly indicates pipeline failures. Option B is wrong because 'Data read' and 'Data written' measure data throughput, not failures. Option C is wrong because 'Duration' shows runtime, but a timeout is a specific failure type.

Option D is wrong because 'Queue depth' is for integration runtimes, not pipeline failures.

Full explanation →

486

MCQeasy

Which Azure service provides fully managed, serverless relational database capabilities for transactional workloads in a data storage solution?

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Synapse Analytics

D.Azure Data Lake Storage Gen2

AnswerB

Why this answer

Azure SQL Database is a fully managed, serverless relational database service designed for transactional workloads. It provides built-in high availability, automatic scaling, and pay-per-use billing, making it ideal for OLTP scenarios without the need to manage underlying infrastructure.

Exam trap

The trap here is that candidates confuse 'fully managed serverless relational database' with Azure Cosmos DB (which is serverless but not relational) or Azure Synapse Analytics (which is relational but designed for analytics, not transactions), leading them to overlook the specific OLTP focus of Azure SQL Database.

Why the other options are wrong

A

Cosmos DB is a NoSQL database, not relational.

C

Synapse Analytics is for large-scale analytics, not transactional workloads.

D

Data Lake Storage is for big data analytics, not relational OLTP.

Full explanation →

487

MCQhard

You are building a data processing pipeline in Azure Synapse Analytics. The pipeline should read data from Azure Data Lake Storage Gen2 (Parquet files), apply transformations using a mapping data flow, and write the results to a dedicated SQL pool table. The source data contains personally identifiable information (PII). You need to mask the PII columns (e.g., email) using a data masking function within the data flow. Which transformation should you use?

A.Derived Column transformation

B.Join transformation

C.Aggregate transformation

D.Pivot transformation

AnswerA

Derived Column can apply expressions, including hash functions like SHA2 for masking PII.

Why this answer

The Derived Column transformation in mapping data flows allows you to create new columns or modify existing ones using expressions, including built-in data masking functions like `mask()`, `maskEmail()`, or `substring()`. This is the correct transformation to apply PII masking on columns such as email addresses within the data flow pipeline before writing to the dedicated SQL pool.

Exam trap

The trap here is that candidates may confuse the Derived Column transformation with the Select transformation (which can also rename or drop columns but does not support expression-based masking), or assume that masking must be done in the sink (dedicated SQL pool) rather than within the data flow itself.

How to eliminate wrong answers

Option B (Join transformation) is wrong because it is used to combine rows from two sources based on a matching condition, not to mask or transform column values. Option C (Aggregate transformation) is wrong because it performs grouping and aggregation operations (e.g., SUM, COUNT) and does not support per-row data masking functions. Option D (Pivot transformation) is wrong because it rotates rows into columns for reshaping data, not for applying masking or transformations to individual column values.

Full explanation →

488

MCQeasy

Your team uses Azure Data Factory to copy data from on-premises SQL Server to Azure Blob Storage. You need to ensure that data in transit is encrypted using TLS 1.2. What should you configure?

A.Ensure the Azure Data Factory pipeline uses HTTPS for the copy activity.

B.Set up a site-to-site VPN between the on-premises network and Azure.

C.Configure the on-premises SQL Server to use SSL certificates.

D.Use Azure ExpressRoute with private peering.

AnswerA

HTTPS uses TLS to encrypt data in transit, and Azure Data Factory supports this by default.

Why this answer

Option A is correct because Azure Data Factory uses HTTPS by default, which enforces TLS encryption for data in transit. Option B is wrong because VPN is not required for TLS encryption. Option C is wrong because Azure ExpressRoute provides a private connection but does not replace TLS.

Option D is wrong because configuring the on-premises SQL Server to use SSL certificates is necessary but not sufficient; Azure Data Factory must also use HTTPS.

Full explanation →

489

MCQmedium

You are building a real-time dashboard that displays sales data from an Azure SQL Database. The dashboard must refresh every 30 seconds with minimal latency. You need to choose the appropriate Azure service for data processing and visualization. Which service should you use?

A.Azure Analysis Services with a tabular model and a scheduled refresh every 30 seconds.

B.Azure Data Explorer (ADX) with a continuous export to Power BI.

C.Power BI with DirectQuery mode and configure automatic page refresh.

D.Azure Synapse Serverless SQL pool with Power BI import mode.

AnswerC

DirectQuery mode allows real-time queries to Azure SQL Database; automatic page refresh supports 30-second intervals.

Why this answer

Option A is correct because Power BI with DirectQuery mode can refresh at sub-minute intervals and query the database directly. Option B (Azure Analysis Services) is for tabular models, not real-time dashboards. Option C (Azure Data Explorer) is optimized for large-scale analytics, not for direct connection to Azure SQL Database.

Option D (Azure Synapse Serverless SQL) is for querying data lakes, not for real-time dashboards.

Full explanation →

490

MCQeasy

You are developing a data processing pipeline in Azure Databricks that processes streaming data from Azure Event Hubs. You need to ensure that the pipeline can recover from failures and process data exactly once. The pipeline writes to Delta Lake. Which approach should you use?

A.Use Azure Stream Analytics to process the stream and output to Delta Lake via Azure Data Lake Storage Gen2.

B.Use Structured Streaming with foreachBatch to write micro-batches to Delta Lake, and set the checkpoint location to Azure Data Lake Storage Gen2.

C.Use Structured Streaming with a Delta Lake sink and specify a checkpoint location on Azure Data Lake Storage Gen2.

D.Use Auto Loader to ingest streaming data from Event Hubs and write to Delta Lake with checkpointing.

AnswerC

Provides exactly-once semantics with checkpointing.

Why this answer

Option B is correct because Structured Streaming with checkpointing enables exactly-once processing by saving the offset and state. Delta Lake's ACID transactions ensure idempotent writes. Option A is wrong because foreachBatch with batch processing may lose exactly-once guarantees.

Option C is wrong because Azure Stream Analytics does not integrate directly with Delta Lake. Option D is wrong because Auto Loader is for batch file ingestion, not streaming.

Full explanation →

491

MCQhard

You are designing a data lake architecture using Azure Data Lake Storage Gen2. Sensitive customer data must be encrypted at rest using customer-managed keys stored in Azure Key Vault. Additionally, access must be audited at the file level. Which combination of features should you implement?

A.Azure AD authentication and Azure Storage firewall

B.Azure Information Protection labels and Azure Policy

C.Customer-managed keys (CMK) in Azure Key Vault and Azure Storage analytics logs

D.Service-managed keys and Azure Monitor alerts

AnswerC

CMK provides encryption at rest, analytics logs provide audit

Why this answer

Option B is correct because customer-managed keys (CMK) in Azure Key Vault provide encryption at rest with customer control, and Azure Storage analytics logs (or diagnostic settings) provide file-level audit logs. Option A is wrong because Azure AD authentication does not provide encryption. Option C is wrong because Azure Information Protection is a labeling solution, not encryption at rest.

Option D is wrong because firewall rules do not encrypt data.

Full explanation →

492

Multi-Selecthard

Which THREE features should you use to optimize query performance in Azure Synapse Analytics dedicated SQL pool? (Choose three.)

Select 3 answers

A.T-SQL views.

B.Geo-redundant storage (GRS).

C.Materialized views.

D.Workload management with workload groups and importance.

E.Result-set caching.

AnswersC, D, E

Materialized views precompute and store results for faster queries.

Why this answer

Options A, C, and E are correct. Materialized views improve performance for aggregations, result-set caching speeds up repeated queries, and workload management ensures resource isolation. Option B is wrong because T-SQL views are logical, not performance optimization.

Option D is wrong because geo-redundant storage is for disaster recovery, not performance.

Full explanation →

493

MCQeasy

You need to store log files from multiple applications in a central location for long-term retention and occasional analysis. The data is rarely accessed after 30 days. Which storage solution should you use to minimize cost?

A.Azure Files

B.Azure Data Lake Storage Gen2

C.Azure Cosmos DB

D.Azure Blob Storage (cool or archive tier)

AnswerD

Blob Storage with cool/archive tiers provides low-cost storage for infrequently accessed data.

Why this answer

Azure Blob Storage with cool or archive tier is the most cost-effective solution for storing log files that are rarely accessed after 30 days. The cool tier offers low storage costs with higher access costs, while the archive tier provides the lowest storage cost but requires rehydration for access, making both ideal for long-term retention and occasional analysis.

Exam trap

The trap here is that candidates may choose Azure Data Lake Storage Gen2 for its analytics capabilities, overlooking that blob storage tiers are specifically designed for cost-efficient long-term retention of infrequently accessed data.

How to eliminate wrong answers

Option A is wrong because Azure Files provides fully managed file shares using SMB protocol, which is designed for shared access and active workloads, not for cost-optimized long-term archival storage. Option B is wrong because Azure Data Lake Storage Gen2 is optimized for big data analytics with hierarchical namespace and high-throughput access, incurring higher storage costs than blob storage tiers for infrequently accessed data. Option C is wrong because Azure Cosmos DB is a NoSQL database with low-latency access and global distribution, designed for transactional workloads, not for cost-efficient long-term retention of log files.

Full explanation →

494

MCQeasy

You need to design a storage solution for streaming data from IoT devices. The solution must support real-time analytics and long-term storage for historical analysis. Which combination of Azure services should you use?

A.Azure Queue Storage and Azure Cosmos DB

B.Azure Event Hubs and Azure Blob Storage

C.Azure IoT Hub and Azure SQL Database

D.Azure Event Hubs and Azure Data Lake Storage Gen2

AnswerD

Event Hubs ingests streaming data, and Data Lake Storage stores it for analytics.

Why this answer

Azure Event Hubs is designed for high-throughput, low-latency ingestion of streaming data from IoT devices, supporting real-time analytics via integration with Azure Stream Analytics. Azure Data Lake Storage Gen2 provides hierarchical namespace and POSIX-compliant access for long-term storage, enabling efficient historical analysis with tools like Azure Synapse Analytics or Spark. This combination meets both real-time and historical requirements without the limitations of other options.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub with Event Hubs, assuming IoT Hub is the primary streaming ingestion service, but IoT Hub is for device management and control, not high-throughput event streaming, making Event Hubs the correct choice for this scenario.

How to eliminate wrong answers

Option A is wrong because Azure Queue Storage is a message queuing service for decoupling applications, not optimized for high-throughput streaming ingestion, and Azure Cosmos DB is a NoSQL database for transactional workloads, not designed for long-term analytical storage with cost-effective tiering. Option B is wrong because while Azure Event Hubs handles streaming ingestion, Azure Blob Storage lacks the hierarchical namespace and optimized analytics capabilities of Data Lake Storage Gen2, making it less suitable for large-scale historical analysis with tools like Spark or Hive. Option C is wrong because Azure IoT Hub is primarily for device management and bidirectional communication, not a high-throughput event ingestion service, and Azure SQL Database is a relational OLTP store not designed for streaming ingestion or cost-effective long-term storage of massive historical data.

Full explanation →

495

MCQeasy

Your team is building a real-time dashboard in Power BI that displays sales data from Azure Stream Analytics. The data must be updated every 5 seconds with low latency. Which output type should you configure in Stream Analytics to achieve this?

A.Power BI

B.Azure Blob Storage

C.Azure SQL Database

D.Azure Event Hubs

AnswerA

Direct streaming to Power BI for low latency.

Why this answer

Power BI is the correct output type because Azure Stream Analytics has a native Power BI output connector that supports real-time streaming datasets, enabling sub-second latency for dashboards. This connector pushes data directly to Power BI's streaming API, which refreshes visuals automatically every 5 seconds as required.

Exam trap

The trap here is that candidates often confuse Azure Event Hubs as a direct output for Power BI, overlooking that Event Hubs is an intermediary and not a visualization endpoint, while Power BI's native Stream Analytics output is the only option that directly feeds the real-time dashboard with low latency.

How to eliminate wrong answers

Option B is wrong because Azure Blob Storage is a batch-oriented, file-based storage service that introduces latency due to file writes and lacks real-time push capabilities, making it unsuitable for sub-5-second updates. Option C is wrong because Azure SQL Database, while supporting row inserts, incurs higher latency due to transactional overhead and connection pooling, and Power BI's DirectQuery or import modes cannot achieve 5-second refresh rates from SQL without significant tuning. Option D is wrong because Azure Event Hubs is a message ingestion service, not a visualization endpoint; it would require an additional downstream consumer to push data to Power BI, adding latency and complexity.

Full explanation →

496

MCQhard

You are building a data processing solution using Azure Databricks. The solution must process streaming data from Azure Event Hubs, join it with a static reference table stored in Azure Data Lake Storage Gen2 (Parquet format), and write the output to Azure Synapse Analytics. The reference table is updated daily. Which approach minimizes latency and ensures data consistency?

A.Use Spark Structured Streaming with a streaming join and cache the reference table as a static DataFrame.

B.Use Spark Structured Streaming with foreachBatch to write to Synapse.

C.Use Spark Structured Streaming with a streaming join and load the reference table in each micro-batch.

D.Use a batch job that runs every hour to process the data.

AnswerA

Caching the reference table as static minimizes latency and ensures consistency.

Why this answer

Option D is correct because using Spark Structured Streaming with a streaming join and a static DataFrame for the reference table minimizes latency and ensures consistency by reading the reference table once. Option A is wrong because batch processing every hour adds latency. Option B is wrong because loading the reference table per batch increases overhead.

Option C is wrong because foreachBatch is useful for batch writes but does not improve join performance.

Full explanation →

497

MCQhard

Your company has a data lake in Azure Data Lake Storage Gen2 that stores sensitive customer information. You need to implement fine-grained access control so that data engineers can read all data, data scientists can read only anonymized data, and auditors can view access logs. The solution must use Azure role-based access control (RBAC) and access control lists (ACLs). You also need to enable auditing of read operations. What should you do?

A.Use Azure RBAC to assign Storage Blob Data Reader to data scientists, and configure lifecycle management to move raw data to archive tier.

B.Assign Storage Blob Data Contributor RBAC role to data engineers at the storage account level, use ACLs to deny read access to raw data for data scientists, and enable diagnostic settings for read requests to Log Analytics.

C.Assign RBAC roles at the storage account level and enable Storage Analytics logs for read operations.

D.Assign RBAC roles at the container level to grant read access to all users, and use Azure Policy to audit access.

AnswerB

Provides fine-grained control and auditing.

Why this answer

Option D is correct: Use Azure RBAC roles for coarse permissions (e.g., Storage Blob Data Contributor for engineers), use ACLs for fine-grained control (e.g., deny read for data scientists on raw data), and enable diagnostic settings to log read operations to Log Analytics. Option A is incorrect: Storage Analytics logs are deprecated. Option B is incorrect: Lifecycle management is for tiering, not access control.

Option C is incorrect: RBAC only cannot provide granular control at file level.

Full explanation →

498

MCQmedium

You are running a batch processing job using Azure Data Factory. The job reads from Azure Blob Storage, transforms data with a Data Flow, and writes to Azure Synapse Analytics. The job fails intermittently with the error: 'Operation on target WriteToSynapse failed: Cannot bulk load because the file could not be opened.' You need to resolve the issue with minimal downtime. What should you do?

A.Enable staging blob file deletion in the copy activity settings.

B.Increase the degree of copy parallelism in the Data Flow.

C.Use PolyBase to load data directly from Blob Storage without staging.

D.Switch to staging via SQL authentication instead of Managed Identity.

AnswerA

Deleting the staging blob after successful load prevents lock conflicts on retry.

Why this answer

The error indicates a transient lock on the staging blob file. Enabling staging blob file deletion in the copy activity settings allows the pipeline to clean up and retry. Increasing the degree of copy parallelism does not address file locking.

Using PolyBase with external tables requires schema changes. Switching to staging via SQL authentication does not solve the file lock issue.

Full explanation →

499

Multi-Selectmedium

You are designing a data processing solution that requires exactly-once processing semantics for streaming data. Which two Azure services support exactly-once processing? (Choose two.)

Select 2 answers

A.Azure Event Hubs

B.Azure Stream Analytics

C.Azure Databricks with Structured Streaming

D.Azure Data Lake Storage Gen2

AnswersB, C

Why this answer

Azure Stream Analytics supports exactly-once processing by using checkpointing and event sourcing to ensure that each event is processed exactly once, even in the event of failures or restarts. It achieves this through its internal state management and the use of checkpoint offsets in the output sink, guaranteeing no duplicate or missed events.

Exam trap

The trap here is that candidates often confuse the ingestion guarantee of Event Hubs (at-least-once) with the processing guarantee of Stream Analytics, or mistakenly think that a storage service like Data Lake Storage Gen2 inherently provides processing semantics.

Why the other options are wrong

A

Event Hubs provides at-least-once delivery; exactly-once requires processing logic.

D

It is a storage service, not a processing engine.

Full explanation →

500

MCQeasy

Which Azure service is primarily used for orchestrating data pipelines in a cloud-native ETL workflow?

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure HDInsight

D.Azure Databricks

AnswerA

Why this answer

Azure Data Factory (ADF) is the correct answer because it is a cloud-native, serverless data integration service specifically designed for orchestrating and automating data pipelines. ADF provides a code-free visual interface, supports over 90 built-in connectors, and enables complex ETL/ELT workflows with control flow, data flow, and trigger-based scheduling, making it the primary orchestration tool in Azure.

Exam trap

The trap here is that candidates confuse Azure Synapse Analytics (which includes pipeline capabilities) as the primary orchestrator, but Synapse pipelines are actually built on Azure Data Factory, and the exam expects you to identify ADF as the dedicated, cloud-native orchestration service.

Why the other options are wrong

B

Synapse is an analytics platform that includes pipelines but is not primarily an orchestration-only service.

C

HDInsight is a managed Hadoop/Spark cluster, not a pipeline orchestrator.

D

Databricks is a collaborative data engineering environment, not an orchestration service.

Full explanation →

501

MCQmedium

A company uses Azure Data Factory to orchestrate an ETL pipeline that copies data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline runs hourly and uses a self-hosted integration runtime. Recently, the pipeline started failing with timeout errors. The on-premises SQL Server is healthy and the network is stable. What is the most likely cause and solution?

A.The self-hosted integration runtime version is outdated; update it to the latest version

B.The copy activity is not using staging; enable staging through Azure Blob Storage

C.The self-hosted integration runtime is under-provisioned; scale up the VM or add more nodes

D.The source query timeout in the copy activity is too low; increase it to 3600 seconds

AnswerC

Under-provisioned IR can cause timeouts; scaling resolves the issue.

Why this answer

Option B is correct because timeouts often occur when the self-hosted IR is overloaded or has insufficient resources, and scaling up or adding nodes resolves the issue. Option A is wrong because the integration runtime version is automatically updated. Option C is wrong because the source query timeout is set to 120 seconds by default and increasing it may mask the problem.

Option D is wrong because staging is used for large data transfers, but the issue is likely IR performance.

Full explanation →

502

MCQeasy

You need to ensure that sensitive data stored in Azure SQL Database is encrypted at rest. Which feature should you enable?

A.Always Encrypted

B.Azure Information Protection

C.Dynamic Data Masking

D.Transparent Data Encryption (TDE)

AnswerD

TDE performs real-time encryption and decryption of the database, backups, and transaction log files at rest.

Why this answer

Option C is correct because Transparent Data Encryption (TDE) encrypts data at rest. Option A is wrong because Always Encrypted encrypts data in use and in transit, not at rest. Option B is wrong because Dynamic Data Masking masks data in query results.

Option D is wrong because Azure Information Protection is a classification and labeling service.

Full explanation →

503

MCQhard

Your company uses Azure Data Lake Storage Gen2 with hierarchical namespace enabled. You need to ensure that only the 'data-scientists' group can read files in the 'processed' container, while denying access to all other users. You have already configured the storage account firewall to allow access only from your corporate network. What should you do next?

A.Create a private endpoint for the storage account and assign the data-scientists group to the private endpoint's access policy

B.Assign the Storage Blob Data Reader role to the data-scientists group at the storage account level and add a deny assignment for all other users

C.Use a managed identity for the data-scientists group and assign the Storage Blob Data Contributor role to the managed identity

D.Configure access control lists (ACLs) on the 'processed' container to grant read and execute permissions to the data-scientists group and set the default ACL to deny all

AnswerD

ACLs provide fine-grained access control at the container or directory level, suitable for this requirement.

Why this answer

Option B is correct because ACLs are the mechanism to grant granular permissions to specific users/groups in ADLS Gen2 with hierarchical namespace. Option A is wrong because RBAC roles grant access at the storage account or container level, and a single deny assignment would block everyone. Option C is wrong because private endpoints control network access, not identity-based permissions.

Option D is wrong because managed identity is used for service-to-service authentication, not for granting read access to a specific group.

Full explanation →

504

MCQeasy

You have an Azure Data Lake Storage Gen2 account that stores log files. You need to implement a data retention policy so that logs older than 90 days are automatically deleted. What should you use?

A.Azure Policy

B.Lifecycle management policy

C.Azure Blob Storage inventory

D.Microsoft Purview

AnswerB

Lifecycle management can automatically delete blobs that are older than a specified number of days.

Why this answer

Option B is correct because a lifecycle management policy can automatically delete blobs based on age. Option A is wrong because Azure Blob Storage inventory provides reports but does not delete. Option C is wrong because Azure Policy enforces compliance but does not delete data.

Option D is wrong because Azure Purview scopes metadata and data discovery, not lifecycle management.

Full explanation →

505

MCQeasy

In Azure Synapse Analytics serverless SQL pool, you query Parquet files stored in Azure Data Lake Storage Gen2. You notice that queries are slow. Which configuration change is most likely to improve performance?

A.Partition the data in Azure Data Lake Storage Gen2

B.Create a pipeline in Azure Synapse to preprocess the data

C.Use OPENROWSET with a properly defined schema and file format

D.Increase the DWU setting of the serverless SQL pool

AnswerC

Specifying schema and file format helps the query optimizer generate efficient execution plans.

Why this answer

Option C is correct because OPENROWSET with an explicitly defined schema and file format (e.g., FORMAT='PARQUET') enables the serverless SQL pool to bypass schema inference, which is a costly runtime operation. By providing a proper schema and file format, the query engine can directly read the Parquet metadata and column statistics, significantly reducing I/O and CPU overhead. This is the most direct and effective performance tuning change for querying Parquet files in a serverless SQL pool.

Exam trap

The trap here is that candidates confuse serverless SQL pool with dedicated SQL pool and assume that increasing DWU (a dedicated pool concept) will improve performance, or they think data partitioning alone is sufficient without addressing the schema inference overhead.

How to eliminate wrong answers

Option A is wrong because partitioning data in Azure Data Lake Storage Gen2 helps with file pruning and parallel reads, but it does not address the primary bottleneck of schema inference and metadata parsing that slows down serverless SQL pool queries. Option B is wrong because creating a preprocessing pipeline adds latency and complexity without fixing the root cause; it may even degrade performance if the pipeline introduces additional data movement or transformation overhead. Option D is wrong because serverless SQL pools do not have a DWU setting; DWU is a dedicated SQL pool (formerly SQL DW) concept, and serverless SQL pools scale automatically based on workload, so increasing a non-existent setting has no effect.

Full explanation →

506

MCQmedium

Refer to the exhibit. A data engineer creates an external table in Azure Synapse Serverless SQL. Which statement about this table is correct?

A.The table supports indexing for performance

B.The external data source TYPE must be 'HADOOP' for Azure Data Lake Storage Gen2

C.The table references a single Parquet file named 'Sales.parquet'

D.The table is read-only

AnswerD

External tables are read-only; modifications must be done to underlying files.

Why this answer

External tables in Azure Synapse Serverless SQL are read-only because they provide a relational abstraction over data stored externally (e.g., in Azure Data Lake Storage Gen2 or Blob Storage). You cannot perform INSERT, UPDATE, DELETE, or DDL modifications on the underlying data through the external table; it is designed solely for querying with T-SQL. This is a fundamental constraint of the serverless SQL pool architecture, which uses the OPENROWSET or CREATE EXTERNAL TABLE syntax to read files in place without a storage engine.

Exam trap

The trap here is that candidates confuse external tables in Synapse Serverless SQL with external tables in dedicated SQL pools (which also support PolyBase with Hadoop connectors) and mistakenly think they can write to or index the table, or they misremember the required data source TYPE for ADLS Gen2.

How to eliminate wrong answers

Option A is wrong because external tables in Synapse Serverless SQL do not support indexing; indexing is a feature of dedicated SQL pools where you can create clustered columnstore indexes, but serverless SQL pool relies on file-level statistics and predicate pushdown to Parquet/CSV files. Option B is wrong because for Azure Data Lake Storage Gen2, the external data source TYPE must be 'HDFS' (not 'HADOOP') when using the abfss:// protocol; 'HADOOP' is used for legacy WASB or on-premises HDFS, and Synapse Serverless SQL requires the 'HDFS' type for ADLS Gen2. Option C is wrong because the CREATE EXTERNAL TABLE statement can reference a folder path containing multiple Parquet files, a glob pattern (e.g., 'Sales*.parquet'), or a single file, but the statement in the exhibit (not shown) typically points to a folder or pattern, and the question does not specify a single file; the table definition uses a LOCATION that can include wildcards, so it is not limited to one file.

Full explanation →

507

MCQeasy

You need to monitor the performance of Azure Stream Analytics jobs. Which Azure Monitor metric can be used to detect if the job is falling behind in processing input data?

A.WatermarkDelay

B.InputEventsBacklog

C.OutputEvents

D.RuntimeErrors

AnswerB

This metric shows the backlog of unprocessed input events.

Why this answer

InputEventsBacklog measures the number of input events that are not yet processed, indicating the job is falling behind. Option B is wrong because OutputEvents counts events written. Option C is wrong because WatermarkDelay shows the latency.

Option D is wrong because RuntimeErrors counts errors.

Full explanation →

508

MCQhard

Your team uses Azure Databricks with Delta Lake for ETL. You notice that the Delta table's version history is growing rapidly, and query performance is degrading. You want to retain the ability to time travel for the last 30 days. Which Delta Lake command should you run?

A.DESCRIBE HISTORY delta_table;

B.VACUUM delta_table RETAIN 30 HOURS;

C.OPTIMIZE delta_table;

D.FSCK REPAIR TABLE delta_table;

AnswerB

VACUUM removes files older than the retention period; 30 hours is a typo but the correct command is VACUUM with retention in hours; default is 7 days. To keep 30 days, set retention to 720 hours.

Why this answer

The VACUUM command in Delta Lake removes files older than the specified retention threshold, which directly addresses the rapid growth of version history and performance degradation. By using `VACUUM delta_table RETAIN 30 HOURS`, you delete stale data files while preserving the last 30 days of history for time travel, as Delta Lake defaults to a 7-day retention period but allows custom retention. This command physically deletes unused files, reducing storage and improving query performance.

Exam trap

The trap here is that candidates confuse VACUUM's retention parameter with days instead of hours, or they mistakenly choose OPTIMIZE thinking it cleans up history, when in fact it only compacts files without removing old versions.

How to eliminate wrong answers

Option A is wrong because DESCRIBE HISTORY only displays the transaction log (version history) metadata; it does not remove any files or improve performance. Option C is wrong because OPTIMIZE compacts small files into larger ones to improve read performance but does not delete old versions or reduce version history growth. Option D is wrong because FSCK REPAIR TABLE is used to recover table metadata after file system changes (e.g., manual file deletions) and does not address version history cleanup or performance degradation.

Full explanation →

509

MCQmedium

You are designing a data storage solution for a global e-commerce company. The company needs to store clickstream data from millions of users with high write throughput and low-latency reads for real-time analytics. The data is semi-structured and includes nested JSON objects. Which Azure data store should you recommend?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Blob Storage

AnswerA

Azure Cosmos DB provides high throughput, low-latency, and native JSON support.

Why this answer

Azure Cosmos DB is the correct choice because it provides a multi-model, globally distributed database service with guaranteed single-digit-millisecond read and write latencies at the 99th percentile, making it ideal for high-throughput clickstream ingestion and real-time analytics. Its native support for semi-structured data and nested JSON objects via the SQL API (or MongoDB API) allows direct storage and querying of complex event payloads without schema flattening. Additionally, Cosmos DB offers automatic indexing and tunable consistency levels to balance performance and data freshness for global e-commerce scenarios.

Exam trap

The trap here is that candidates often choose Azure Table Storage because it is a NoSQL store, but they overlook its lack of native JSON support and sub-10ms latency guarantees, confusing its simple key-value model with the richer document capabilities of Cosmos DB.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because it is a NoSQL key-value store that does not natively support nested JSON objects; it requires flattening complex structures into flat key-value pairs, which adds overhead and complicates real-time analytics on clickstream data. Option C (Azure SQL Database) is wrong because it is a relational database that enforces a fixed schema, making it poorly suited for semi-structured, schema-on-read clickstream data with varying nested JSON fields; it also cannot match Cosmos DB's sub-10ms write throughput at scale. Option D (Azure Blob Storage) is wrong because it is an object store designed for large, unstructured binary data (e.g., images, logs) and does not provide low-latency, indexed query capabilities for real-time analytics on individual clickstream events; it lacks native support for querying nested JSON without additional compute layers like Azure Data Lake or Synapse.

Full explanation →

510

MCQeasy

Refer to the exhibit. You are reviewing an ARM template snippet for an Azure Synapse Analytics workspace. The template defines an integration runtime. A colleague asks whether this integration runtime can be used to copy data from an on-premises SQL Server database to Azure Blob Storage. What should you answer?

A.No, you need an Azure integration runtime that is configured for on-premises access.

B.Yes, because the integration runtime is self-hosted and can access on-premises data sources.

C.No, you need a managed virtual network integration runtime for on-premises sources.

D.Yes, but only if the integration runtime is installed on a domain-joined machine.

AnswerB

Self-hosted IR is used for on-premises/private network data sources.

Why this answer

Option A is correct because a self-hosted integration runtime is designed for on-premises and VM-based data sources. Option B is wrong because managed VNet IR is for Azure-only sources. Option C is wrong because Azure IR is for cloud data sources.

Option D is wrong because it is possible with a self-hosted IR.

Full explanation →

511

Multi-Selecthard

Which THREE actions improve the performance of a Delta table in Azure Databricks? (Choose three.)

Select 3 answers

A.Run VACUUM to remove old versions and compact small files

B.Disable auto-optimize to reduce overhead

C.Run OPTIMIZE with ZORDER BY on frequently filtered columns

D.Increase the number of shuffle partitions

E.Partition the table by high-cardinality columns

AnswersA, C, E

Compaction reduces file count.

Why this answer

Options A, B, and E are correct. Z-ordering optimizes data layout, compaction reduces small files, and partitioning improves query pruning. Option C (Disable auto-optimize) would worsen performance.

Option D (Increase spark.sql.shuffle.partitions) may help but is not specific to Delta tables.

Full explanation →

512

MCQhard

Refer to the exhibit. You are creating an Azure Storage account using an ARM template with the above snippet. After deployment, a security auditor reviews the configuration and notes that the storage account is not using a customer-managed key for encryption. What is the most likely reason?

A.The 'keyVersion' is missing a specific version, so Azure Storage defaults to Microsoft-managed key.

B.The 'keySource' should be 'Microsoft.Storage' for customer-managed key.

C.The storage account requires double encryption to use customer-managed key.

D.The 'infrastructureEncryption' setting is enabled, which overrides customer-managed key.

AnswerA

For customer-managed key, a specific key version is required; an empty version may cause Azure to use the latest but if the key is not accessible, it falls back to Microsoft-managed key.

Why this answer

Option B is correct because the keyVersion is empty, which means Azure Storage will use the latest version of the key, but the key must be present in the vault. However, the snippet does not specify the 'keyVaultProperties' correctly; the property 'keyName' is valid, but the 'keyVersion' being empty might cause Azure to default to a Microsoft-managed key if the key does not exist or if the vault is not accessible. Option A is wrong because infrastructure encryption is independent of key source.

Option C is wrong because 'infrastructureEncryption' is a separate setting. Option D is wrong because double encryption is not the issue.

Full explanation →

513

MCQmedium

A company is designing a data lake solution on Azure Data Lake Storage Gen2. Data will be ingested from IoT devices at high frequency (every 5 seconds). Each device sends a JSON payload of 2 KB. The data must be stored in a hierarchical namespace and partitioned by date and device ID to optimize query performance. Which partition strategy should be used?

A.Use Azure SQL Database with clustered columnstore index on date and device ID.

B.Organize folders as /YYYY/MM/DD/DeviceID/ in ADLS Gen2 and use file naming that includes timestamp.

C.Use Azure Table Storage with PartitionKey set to date and RowKey set to device ID.

D.Use Azure Cosmos DB with partition key on (date, device ID) and TTL for data retention.

AnswerB

This folder structure enables efficient partition pruning based on date and device ID.

Why this answer

Option B is correct because ADLS Gen2 with a hierarchical namespace allows folder-based partitioning by date and device ID (e.g., /YYYY/MM/DD/DeviceID/), which directly maps to the query optimization requirement. This structure enables efficient partition pruning for time-range and device-specific queries, and the high-frequency 2 KB JSON payloads are well-suited for append-friendly file naming with timestamps.

Exam trap

The trap here is that candidates confuse storage services (ADLS Gen2) with database or NoSQL solutions (SQL Database, Table Storage, Cosmos DB), failing to recognize that the question explicitly requires a data lake with a hierarchical namespace, which only ADLS Gen2 provides.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database with a clustered columnstore index is a relational store, not a data lake solution, and it does not support a hierarchical namespace or folder-based partitioning as required. Option C is wrong because Azure Table Storage is a NoSQL key-value store that lacks a hierarchical namespace and folder organization; its PartitionKey/RowKey model does not provide the folder-based partitioning by date and device ID needed for ADLS Gen2. Option D is wrong because Azure Cosmos DB is a globally distributed NoSQL database, not a data lake storage service, and its partition key on (date, device ID) does not create a hierarchical folder structure in ADLS Gen2.

Full explanation →

514

MCQmedium

You are designing a streaming solution in Azure Synapse Analytics using the serverless SQL pool to query streaming data in real-time. The data is ingested via Azure Event Hubs and processed using Azure Stream Analytics. The output of Stream Analytics is written to Azure Data Lake Storage Gen2 in Delta Lake format. You need to ensure that the serverless SQL pool can query the latest data with minimal latency. Which approach should you use?

A.Ingest data directly from Event Hubs into serverless SQL pool using CETAS (CREATE EXTERNAL TABLE AS SELECT).

B.Use a materialized view in serverless SQL pool that refreshes every minute.

C.Load the streaming data into a dedicated SQL pool using a scheduled pipeline and then query it from serverless SQL pool.

D.Create an external table in serverless SQL pool that points to the Delta Lake folder and query it directly.

AnswerD

Serverless SQL pool supports Delta Lake format and can query it as soon as data is written.

Why this answer

Option D is correct because serverless SQL pool can directly query Delta Lake format files stored in Azure Data Lake Storage Gen2 by creating an external table with LOCATION pointing to the Delta folder. This allows real-time querying of the latest streaming data without any data movement or transformation, achieving minimal latency since Stream Analytics writes continuously to Delta Lake.

Exam trap

The trap here is that candidates may confuse serverless SQL pool with dedicated SQL pool and assume features like materialized views or scheduled pipelines are available, or they may think CETAS can directly ingest from Event Hubs, which is not supported.

How to eliminate wrong answers

Option A is wrong because CETAS creates a new external table by selecting data from a source, but it cannot ingest data directly from Event Hubs; it requires a source like an existing external table or file set, and it does not support real-time streaming ingestion. Option B is wrong because serverless SQL pool does not support materialized views; materialized views are a feature of dedicated SQL pool, not serverless. Option C is wrong because loading data into a dedicated SQL pool via a scheduled pipeline introduces batch latency (scheduled intervals), which contradicts the requirement for minimal latency in a streaming solution.

Full explanation →

515

Matchingmedium

Match each Azure service tier to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Hierarchical namespace for Azure Data Lake Storage

Optimized for frequent data access

Optimized for infrequent access with lower cost

Lowest cost for rarely accessed data

Why these pairings

These tiers apply to Azure Blob Storage and Data Lake Storage.

Full explanation →

516

MCQmedium

You are troubleshooting a slow-running Azure Data Factory pipeline that copies data from an Azure SQL Database to ADLS Gen2. The pipeline uses a copy activity with the default settings. The source table has 10 million rows. Which optimization should you apply first?

A.Set the 'parallel copies' property to 10.

B.Replace the copy activity with a mapping data flow.

C.Increase the data integration unit (DIU) to maximum.

D.Enable staged copy using an Azure Blob Storage staging location.

AnswerD

Staging allows data to be transferred via Blob Storage, which improves throughput for SQL to ADLS copies.

Why this answer

Option B is correct because enabling staging via Blob Storage improves performance for large cross-region or cross-service copies. Option A (parallel copies) might help but default settings already use parallelism. Option C (DIU increase) is a more expensive solution.

Option D (data flow) is heavier and not needed for simple copy.

Full explanation →

517

MCQhard

Your team is migrating an on-premises SQL Server data warehouse to Azure Synapse Analytics. The source data includes fact tables and dimension tables with complex relationships. You need to design the storage in Azure Synapse to minimize query latency for star schema queries. Which distribution and index strategy should you use for the fact table?

A.Hash distribution on the most joined dimension key with clustered columnstore index

B.Round-robin distribution with clustered columnstore index

C.Replicated distribution with clustered columnstore index

D.Hash distribution on a dimension key with heap index

AnswerA

Hash distribution colocates join data and columnstore index optimizes analytics.

Why this answer

Hash distribution on the most joined dimension key ensures that rows with the same key value are co-located on the same distribution, minimizing data movement during star schema joins. A clustered columnstore index provides high compression and batch-mode processing, which significantly reduces query latency for analytical workloads in Azure Synapse.

Exam trap

The trap here is that candidates often choose round-robin distribution thinking it balances load evenly, but they overlook the severe join performance penalty caused by data movement across distributions in star schema queries.

How to eliminate wrong answers

Option B is wrong because round-robin distribution distributes rows evenly without considering join keys, causing excessive data shuffling across distributions during joins, which increases query latency. Option C is wrong because replicated distribution copies the entire table to each distribution node, which is impractical for large fact tables due to storage overhead and data movement during updates. Option D is wrong because a heap index lacks ordering and compression, leading to full table scans and poor query performance for star schema queries.

Full explanation →

518

MCQeasy

You need to transform semi-structured JSON data into a tabular format for analysis in Azure Synapse Analytics. The data is stored in ADLS Gen2. Which feature should you use to query the JSON data directly without loading it into a table?

A.Use the OPENROWSET function in a serverless SQL pool.

B.Use Azure Data Factory to flatten the JSON and store as Parquet.

C.Create an external table using PolyBase.

D.Use the COPY INTO command in a dedicated SQL pool.

AnswerA

OPENROWSET with JSON file format allows querying JSON directly.

Why this answer

Option A is correct because OPENROWSET in Synapse serverless SQL can query JSON files directly. Option B (COPY INTO) loads data. Option C (PolyBase) requires external tables.

Option D (Azure Data Factory) is an orchestration tool.

Full explanation →

519

MCQeasy

A company runs a streaming pipeline using Azure Stream Analytics to ingest IoT data and output to Azure SQL Database. They notice that the output latency increases over time and eventually the job fails with a timeout error. What is the most likely cause?

A.The Stream Analytics job has a high late arrival tolerance.

B.The event hub is not partitioned correctly.

C.The event hub consumer group is misconfigured.

D.The Azure SQL Database target table lacks proper indexes.

AnswerD

Missing indexes slow down write operations, causing backpressure and eventual timeout.

Why this answer

The most likely cause is that the Azure SQL Database target table lacks proper indexes. Without indexes, each batch of output from Stream Analytics triggers full table scans for inserts or updates, causing cumulative latency. Over time, the backlog exceeds the job's timeout threshold (default 5 minutes for output), leading to failure.

Exam trap

The trap here is that candidates often attribute output latency to input-side issues like partitioning or consumer groups, but the symptom of increasing latency over time points to a downstream bottleneck, specifically missing indexes on the SQL target table.

How to eliminate wrong answers

Option A is wrong because high late arrival tolerance delays watermark advancement but does not cause progressive output latency or timeouts; it affects event ordering, not throughput. Option B is wrong because incorrect event hub partitioning affects input ingestion parallelism, not output latency to SQL Database; the job would show high input backlog, not output timeout. Option C is wrong because a misconfigured consumer group (e.g., multiple readers) causes checkpoint conflicts or duplicate reads, not a gradual increase in output latency; the job would fail with partition-related errors, not timeout.

Full explanation →

520

MCQmedium

You are developing a real-time data processing solution using Azure Stream Analytics. The input is from an Azure Event Hub, and the output is to an Azure Synapse Analytics dedicated SQL pool. You need to ensure exactly-once delivery semantics to the SQL pool. What should you configure?

A.Enable checkpointing in the query.

B.Use the Azure Synapse Analytics output adapter with exactly-once semantics.

C.Use a custom deserializer.

D.Configure event ordering by timestamp.

AnswerB

Stream Analytics supports exactly-once output to Synapse when configured correctly.

Why this answer

Option D is correct because Stream Analytics offers exactly-once output to Azure Synapse via the dedicated SQL pool output adapter. Option A is wrong because checkpointing is for recovery, not exactly-once. Option B is wrong because event ordering is about sequence, not delivery guarantees.

Option C is wrong because custom deserializers do not provide exactly-once.

Full explanation →

521

MCQeasy

You need to store historical sales data for 10 years with infrequent queries. The storage cost must be minimized while retaining the ability to query using Azure Synapse serverless SQL pool. Which storage tier should you use?

A.Azure Storage Archive tier.

B.Azure Storage Premium tier.

C.Azure Storage Hot tier.

D.Azure Storage Cool tier.

AnswerD

Cool tier provides low-cost storage for infrequent access and is online for queries.

Why this answer

The Cool tier is the correct choice because it provides low-cost storage for data that is infrequently accessed (e.g., historical sales data spanning 10 years) while still supporting immediate read access via Azure Synapse serverless SQL pool. Unlike the Archive tier, Cool tier data is online and can be queried without the need for time-consuming rehydration, making it suitable for infrequent but on-demand analytical queries.

Exam trap

The trap here is that candidates often confuse 'infrequent queries' with 'no queries' and incorrectly choose the Archive tier, forgetting that Azure Synapse serverless SQL pool cannot directly query archived data without a time-consuming rehydration process.

How to eliminate wrong answers

Option A is wrong because the Archive tier is designed for long-term backup and rarely accessed data, requiring a rehydration step (which can take up to 15 hours) before data can be queried by Azure Synapse serverless SQL pool, making it unsuitable for even infrequent queries. Option B is wrong because the Premium tier is optimized for low-latency, high-transaction workloads and is significantly more expensive, which contradicts the requirement to minimize storage cost. Option C is wrong because the Hot tier is intended for frequently accessed data and has higher storage costs than the Cool tier, so it does not meet the cost-minimization goal for infrequently queried historical data.

Full explanation →

522

MCQhard

You run the PowerShell command shown in the exhibit for an Azure Synapse Analytics dedicated SQL pool. Which configuration will be applied?

A.The SQL pool is configured with transactional replication and auto-pause.

B.The command fails because dedicated SQL pools do not support auto-pause.

C.The SQL pool is set to auto-pause after 15 minutes of inactivity.

D.The SQL pool is partitioned into 10 partitions with automatic cleanup.

AnswerC

The AutoPauseDelayInMinutes parameter sets auto-pause; other properties are ignored.

Why this answer

Option B is correct because the command sets auto-pause delay to 15 minutes, but the property 'IsTransactional' is not a valid property for Set-AzSynapseSqlPool; it will be ignored. Auto-pause is a serverless SQL pool feature, but for dedicated SQL pool, it's not supported; however, the command will attempt to set it but fail silently or be ignored. In practice, auto-pause delay is only for serverless SQL pools.

The most accurate answer is that auto-pause is set, but other properties are invalid.

Full explanation →

523

MCQmedium

You are reviewing an ARM template snippet for an Azure Blob Storage container. What is the effect of this configuration?

A.It enables legal hold on the container for 2555 days.

B.It blocks append writes to blobs in the container.

C.It enforces a 7-year immutable retention policy allowing appends.

D.It sets a 3-year retention policy blocking any modifications.

AnswerC

2555 days ≈ 7 years; allowProtectedAppendWrites allows log appends.

Why this answer

The ARM template snippet configures a time-based retention policy on the container with a retention period of 2555 days (7 years) and sets `allowProtectedAppendWrites` to true. This enables an immutable storage policy that prevents deletion or modification of existing blobs but allows new append blocks to be added to append blobs, making option C correct.

Exam trap

The trap here is that candidates confuse the 2555-day value with a 3-year period (1095 days) or misinterpret `allowProtectedAppendWrites` as blocking appends, when in fact it enables them under immutable storage.

How to eliminate wrong answers

Option A is wrong because legal hold is a separate immutable policy that does not have a time limit; it remains in effect until explicitly removed, and the snippet specifies a time-based retention period of 2555 days, not legal hold. Option B is wrong because the snippet sets `allowProtectedAppendWrites` to true, which explicitly permits append writes to append blobs, not blocks them. Option D is wrong because the retention period is 2555 days (7 years), not 3 years, and the policy allows appends rather than blocking all modifications.

Full explanation →

524

MCQhard

A company uses Azure Data Lake Storage Gen2 with a hierarchical namespace. They need to secure access to specific directories using RBAC roles. Which RBAC role should be assigned to a user to grant read and write access to a specific folder without giving access to other folders in the same container?

A.Storage Account Contributor

B.Storage Blob Data Contributor with ACLs at folder level

C.Storage Blob Data Reader

D.Storage Blob Data Owner

AnswerB

Grants read/write to specific folder.

Why this answer

Option B is correct because Azure Data Lake Storage Gen2 supports POSIX-like access control lists (ACLs) at the directory and file level. Assigning the Storage Blob Data Contributor role at the storage account level grants broad data-plane access, but when combined with ACL entries on a specific folder, you can restrict read and write permissions to only that folder. This allows granular security without affecting other folders in the same container.

Exam trap

The trap here is that candidates often assume RBAC roles alone can be scoped to a folder level, but in Azure Data Lake Storage Gen2, RBAC roles apply to the entire storage account or container, and folder-level security requires ACLs.

How to eliminate wrong answers

Option A is wrong because Storage Account Contributor is an Azure RBAC role that grants management-plane access (e.g., creating storage accounts, managing keys) but does not grant any data-plane permissions to read or write blob data. Option C is wrong because Storage Blob Data Reader provides read-only access to all blob data in the storage account; it cannot be scoped to a specific folder and does not grant write access. Option D is wrong because Storage Blob Data Owner grants full data-plane control (read, write, delete, and ACL management) over all blobs in the storage account, which is far broader than the required folder-level restriction.

Full explanation →

525

MCQmedium

You are designing a security strategy for Azure Synapse Analytics. The solution must prevent users from accessing sensitive columns in a dedicated SQL pool, such as Social Security numbers, unless they have explicit permission. Which feature should you use?

A.Column-level security.

B.Azure Purview data classification.

C.Dynamic data masking.

D.Row-level security (RLS).

AnswerA

Column-level security allows granting or denying access to specific columns.

Why this answer

Option B is correct because column-level security restricts access to specific columns. Option A is wrong because row-level security filters rows, not columns. Option C is wrong because dynamic data masking obfuscates data but does not prevent access.

Option D is wrong because Azure Purview is for data discovery and governance, not access control.

Full explanation →

Page 7 of 12

All pages

1 2 3 4 5 6 7 8 9 10 11 12

Practice DP-203 by domain

Target a specific domain to shore up weak areas.

Secure, monitor, and optimize data storage and data processing Design and develop data processing Design and implement data security Monitor and optimize data storage and processing Design and implement data storage Develop data processing

See all domains with question counts →