Knowledge + Practice

Microsoft Azure Data Engineer Associate DP-203 (DP-203) — Questions 826–846

846 questions total · 12pages · All types, answers revealed

Take a mock exam Exam hub

Page 12 of 12

826

MCQmedium

A company is designing a data lake on Azure Data Lake Storage Gen2. They need to enforce row-level security on the data for different departments. Which approach should they use?

A.Create views in Azure Synapse Serverless SQL with security predicates

B.Assign Azure RBAC roles for each department on the storage account

C.Implement Azure Purview data policies for row-level security

D.Use Azure Data Lake Storage Gen2 access control lists (ACLs) at the file level

AnswerA

Serverless SQL can query ADLS and apply RLS via views.

Why this answer

Azure Synapse Serverless SQL supports row-level security (RLS) through the creation of views that use the `SECURITY_POLICY` and `FILTER_PREDICATE` functions. This allows you to filter rows based on the user's identity (e.g., department membership) without duplicating data or managing separate files. It is the only native Azure service that provides declarative row-level filtering directly on data stored in Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates confuse Azure RBAC or ACLs (which control access to storage objects) with row-level security (which controls access to rows within a data set), leading them to choose a storage-level permission model instead of a query-level filtering mechanism.

How to eliminate wrong answers

Option B is wrong because Azure RBAC roles operate at the storage account, container, or blob level, not at the row level; they cannot filter individual rows within a file. Option C is wrong because Azure Purview data policies currently support column-level sensitivity classification and access control, but do not enforce row-level security predicates on data in ADLS Gen2. Option D is wrong because ADLS Gen2 ACLs control access at the file or directory level (POSIX-style permissions), not at the row level within a file.

Full explanation →

827

Multi-Selecthard

Which THREE components are valid parts of the Microsoft Purview Data Map? (Choose THREE)

Select 3 answers

A.Scan rule sets

B.Sensitivity labels

C.Data flows

D.Data sources

E.Classifications

AnswersA, D, E

Scan rule sets define how data sources are scanned.

Why this answer

Correct answers: A, B, D. The Purview Data Map includes sources, scans, and classifications. C is wrong because data flows are part of Azure Data Factory.

E is wrong because sensitivity labels are part of Microsoft Information Protection, but they can be applied to assets in Purview; however, they are not a component of the Data Map itself. The question asks for components of the Data Map. According to Microsoft documentation, the Data Map consists of sources, scans, and classifications.

So A, B, D are correct.

Full explanation →

828

MCQmedium

Refer to the exhibit. You are configuring an Azure Purview data policy for Azure Storage. The policy above is intended to audit all access events. However, the security team complains that not all read events are being audited. What is the most likely reason?

A.The filter predicate is set to 'true', which only captures a subset of events.

B.The storage account is not enabled for Purview policy enforcement.

C.The action group 'ALL_ACTIONS' does not include read events.

D.The policy excludes the 'Read' action by default.

AnswerB

Without enabling 'AllowPurviewPolicyEnforcement' on the storage account, Purview policies are not applied.

Why this answer

Option C is correct because Azure Purview data policies for auditing require the 'Actions' to specifically include 'Read' or use 'ALL_ACTIONS' which should work, but the issue might be that the policy is not applied to the correct scope. Option A is wrong because the predicate 'true' includes all. Option B is wrong because 'ALL_ACTIONS' includes read.

Option D is wrong because source and target include all. The most likely reason is that the policy is not deployed to the storage account or the storage account does not have the 'AllowPurviewPolicyEnforcement' property enabled.

Full explanation →

829

MCQmedium

Your company stores sensitive customer data in Azure SQL Database. You need to encrypt the data at rest and ensure that only your application can decrypt it, even from database administrators. What should you implement?

A.Transparent Data Encryption (TDE)

B.Always Encrypted

C.Dynamic Data Masking

D.Azure Storage Service Encryption

AnswerB

Why this answer

Always Encrypted is correct because it ensures that sensitive data is encrypted at rest and in use, and the encryption keys are stored client-side, so only the application can decrypt the data. Database administrators (DBAs) cannot access the plaintext data because they lack the column encryption keys, even though they have full administrative access to the database.

Exam trap

The trap here is that candidates often confuse Transparent Data Encryption (TDE) with client-side encryption, assuming TDE protects against DBA access, but TDE only protects data at rest from storage theft, not from authorized database users.

Why the other options are wrong

A

TDE encrypts data at rest but the database engine holds the keys, allowing DBAs to decrypt.

C

Only masks data from unauthorized users; data is still stored in plaintext.

D

This applies to Azure Storage, not SQL Database.

Full explanation →

830

MCQeasy

You need to process a large number of CSV files stored in Azure Data Lake Storage Gen2 using Azure Databricks. The files are nested in multiple folders, and the schema varies slightly between files. You want to automatically infer the schema and handle schema evolution. Which read option should you use?

A.spark.read.option("mergeSchema","true").csv(path)

B.spark.read.format("delta").load(path)

C.spark.read.option("inferSchema","true").csv(path)

D.spark.read.csv(path)

AnswerA

Infers and merges schemas.

Why this answer

Option C (spark.read.format("csv").option("mergeSchema","true").load(path)) is correct because mergeSchema enables automatic schema inference and merging across files with different schemas. Option A loads without schema evolution. Option B infers schema but does not merge.

Option D is for Delta Lake, not CSV.

Full explanation →

831

MCQmedium

You are designing a data processing solution for a media company that uses Azure Synapse Analytics. The solution must process video metadata stored in Azure Cosmos DB and combine it with user interaction data from Azure Data Lake Storage Gen2. The combined data must be stored in a dedicated SQL pool for reporting. The data volume is moderate, and the processing should be done using T-SQL. Which approach should you use?

A.Use Azure Data Factory with Copy activities to load data from Cosmos DB and ADLS Gen2 into the dedicated SQL pool, then use a stored procedure to merge.

B.Use Azure Databricks to read from Cosmos DB and ADLS Gen2, transform with Spark SQL, and write to the dedicated SQL pool using JDBC.

C.Use the serverless SQL pool to create external tables on Cosmos DB and ADLS Gen2, then use CREATE EXTERNAL TABLE AS SELECT (CETAS) to write the combined data to a table in the dedicated SQL pool.

D.Use PolyBase to create external tables on Cosmos DB and ADLS Gen2, then use INSERT...SELECT to load into dedicated SQL pool.

AnswerC

Leverages T-SQL and serverless pool for in-place querying.

Why this answer

Option A is correct because Azure Synapse Analytics serverless SQL pool can query Cosmos DB via Azure Cosmos DB analytical store and ADLS Gen2 via OPENROWSET, and then you can use CETAS to write results to the dedicated SQL pool. Option B is wrong because Azure Data Factory would require copy activities and may not be as efficient. Option C is wrong because Azure Databricks uses Spark, not T-SQL.

Option D is wrong because PolyBase cannot directly query Cosmos DB.

Full explanation →

832

MCQhard

Refer to the exhibit. A Bicep file is used to deploy an Azure Synapse Analytics workspace. What is the purpose of the 'purviewConfiguration' property?

A.It links the workspace to a Microsoft Purview account for data lineage and cataloging

B.It configures automated backups of the Synapse workspace

C.It enables monitoring of data movement by Azure Monitor

D.It connects the workspace to a data catalog for pipeline sources

AnswerA

Purview provides data lineage and cataloging.

Why this answer

The 'purviewConfiguration' property in a Bicep file for Azure Synapse Analytics links the workspace to a Microsoft Purview account. This integration enables automated data lineage tracking, cataloging, and discovery across the Synapse environment, allowing users to search for and govern data assets directly from Purview. Without this property, the Synapse workspace operates independently of Purview's unified data governance capabilities.

Exam trap

The trap here is that candidates confuse the Purview integration with general cataloging or monitoring features, assuming it only applies to pipeline sources rather than understanding it provides full data lineage and cataloging across the entire Synapse workspace.

How to eliminate wrong answers

Option B is wrong because automated backups of a Synapse workspace are configured via the 'sqlPoolBackup' or workspace-level backup policies, not through the 'purviewConfiguration' property, which is solely for Purview integration. Option C is wrong because enabling monitoring of data movement by Azure Monitor is done through diagnostic settings and workspace-level monitoring configurations, not by linking to Purview. Option D is wrong because connecting the workspace to a data catalog for pipeline sources is a general description of Purview's role, but the specific purpose of 'purviewConfiguration' is to link to a Microsoft Purview account for full data lineage and cataloging, not just for pipeline sources.

Full explanation →

833

Multi-Selecthard

Which THREE statements are true about partitioning in Azure Synapse Analytics dedicated SQL pool?

Select 3 answers

A.Partition switching can be used to quickly load data into a table.

B.Partitions are automatically aligned with distributions.

C.Each partition is stored as a separate set of rowgroups in a columnstore index.

D.Partitioning is only supported on tables with clustered rowstore indexes.

E.Excessive partitioning can lead to fragmentation and poor query performance.

AnswersA, C, E

Switching partitions is a metadata-only operation.

Why this answer

Option A is correct because partition switching in Azure Synapse Analytics dedicated SQL pool allows you to quickly load data into a table by switching a partition from a staging table into the target table. This operation is metadata-only and does not require data movement, making it highly efficient for incremental data loads.

Exam trap

The trap here is that candidates often confuse partitions with distributions, thinking they are automatically aligned, or assume partitioning is only for rowstore indexes, when in fact columnstore indexes are the recommended and most common storage type for partitioning in dedicated SQL pool.

Full explanation →

834

MCQhard

An Azure Synapse Analytics pipeline uses a Copy activity to ingest data from Azure Blob Storage into a dedicated SQL pool. You notice that the data load is slow. You need to improve performance by enabling staging. What is the primary benefit of using staging?

A.It reduces the amount of data scanned in the source.

B.It enables data validation before loading.

C.It allows PolyBase to use parallel loading for better throughput.

D.It transforms data into columnstore format before loading.

AnswerC

PolyBase loads from staging files in parallel.

Why this answer

Option C is correct because staging allows PolyBase to bulk load data efficiently. Option A is wrong because staging may actually improve data consistency. Option B is wrong because staging reduces load on the SQL pool.

Option D is wrong because staging uses blobs, not the SQL pool.

Full explanation →

835

MCQeasy

You are using Azure Synapse Pipelines to perform an incremental load from Azure SQL Database to Azure Synapse Analytics. You need to identify rows that have changed since the last load. Which approach should you use?

A.Compare the current data with a snapshot using T-SQL MERGE.

B.Truncate and reload the entire table daily.

C.Use a watermark column such as LastModifiedDate.

D.Enable Change Data Capture (CDC) on the source table.

AnswerD

CDC captures all changes and supports incremental load.

Why this answer

Option D is correct because Change Data Capture (CDC) on the source Azure SQL Database captures insert, update, and delete operations in change tables, enabling Azure Synapse Pipelines to efficiently identify only the changed rows since the last load. This approach minimizes data movement and processing overhead compared to full or snapshot-based comparisons, making it the recommended pattern for incremental loads in Synapse Pipelines.

Exam trap

The trap here is that candidates often choose the watermark column approach (Option C) because it seems simpler, but they overlook that CDC is the only option that natively captures all DML changes (including deletes) without requiring schema modifications or custom logic to handle edge cases like out-of-order updates.

How to eliminate wrong answers

Option A is wrong because comparing current data with a snapshot using T-SQL MERGE requires storing a full snapshot and performing row-by-row comparison, which is resource-intensive and does not leverage Synapse Pipelines' native incremental load capabilities. Option B is wrong because truncate and reload the entire table daily defeats the purpose of incremental loading, causing unnecessary full data transfer and processing, and is not a valid incremental approach. Option C is wrong because using a watermark column such as LastModifiedDate only captures updates to rows that have a timestamp updated, but it cannot detect deletes or changes to rows where the timestamp is not maintained, and it requires the source to reliably update the column on every change.

Full explanation →

836

MCQhard

Refer to the exhibit. You created an external table in Azure Synapse Analytics serverless SQL pool to query Parquet files. Queries return no rows even though the files exist. What is the most likely issue?

A.The CREDENTIAL is missing

B.The FILE_FORMAT is incorrectly specified

C.The DATA_COMPRESSION is not supported for Parquet

D.The LOCATION path in the external table is relative to the data source, but the data source points to the wrong container or folder

AnswerD

The data source points to 'sales' container, table location adds 'parquet/sales', likely the files are not there.

Why this answer

Option B is correct. The external table LOCATION is 'parquet/sales/' but the external data source points to the root 'sales' container. The combined path is 'sales/parquet/sales/', which may be wrong.

Option A is wrong because compression is supported. Option C is wrong because credential is defined. Option D is wrong because the file format is correct.

Full explanation →

837

MCQhard

You are using Azure Stream Analytics to process real-time temperature data from IoT devices. The output must be written to Azure SQL Database. The job has been running successfully for weeks, but recently you notice that the output data has duplicate rows. The input events are unique. The job uses a windowed aggregation (TumblingWindow). What is the most likely cause of duplicates?

A.The job is not handling late-arriving events.

B.The job is being restarted and reprocessing data.

C.The input event hub is receiving duplicate events.

D.The tumbling window size is too small.

AnswerB

Restart can cause reprocessing and duplicate output without idempotent writes.

Why this answer

Option D is correct because when a Stream Analytics job recovers from a failure, it reprocesses input from the last checkpoint, which can cause duplicate output if the output is not idempotent. Option A is wrong because window size affects aggregation but not duplicates. Option B is wrong because late events can cause out-of-order results but not duplicates if handled correctly.

Option C is wrong because the input is unique, so duplicates are not from input.

Full explanation →

838

MCQmedium

Your organization is implementing a data lake using Azure Data Lake Storage Gen2. You have a folder structure like '/data/landing/' for raw data and '/data/curated/' for cleaned data. The data is ingested daily from various sources. You need to ensure that data in the curated zone is immutable and cannot be modified or deleted by anyone, including administrators, for compliance reasons. However, data in the landing zone should be modifiable. What should you do?

A.Enable immutable storage with a time-based retention policy on the curated zone container

B.Remove the 'Delete' permission from the storage account key

C.Set ACLs on the curated zone folder to deny write and delete for all users

D.Use Azure RBAC to deny delete and write permissions for all users on the curated zone folder

AnswerA

Immutable storage prevents any modification or deletion until the retention period expires.

Why this answer

Option A is correct because Azure Data Lake Storage Gen2 supports immutable storage at the container level, which enforces a time-based retention policy that prevents any data from being modified or deleted—even by administrators—until the retention period expires. This directly meets the compliance requirement for the curated zone, while leaving the landing zone container unaffected and modifiable.

Exam trap

The trap here is that candidates often assume ACLs or RBAC alone can enforce immutability, but they fail to recognize that only container-level immutable storage provides the WORM guarantee that cannot be overridden by administrators or privileged accounts.

How to eliminate wrong answers

Option B is wrong because removing the 'Delete' permission from the storage account key does not prevent modifications (overwrites) and does not block privileged users like administrators who have other access methods (e.g., Azure RBAC, managed identities). Option C is wrong because ACLs on a folder can be overridden by users with higher-level permissions (e.g., storage account key, RBAC roles) and do not provide the legal hold or compliance-grade immutability required. Option D is wrong because Azure RBAC deny assignments can be bypassed by users with Owner or Contributor roles at a higher scope, and they do not enforce the same write-once-read-many (WORM) behavior as immutable storage; RBAC alone cannot prevent deletion by storage account key holders or service administrators.

Full explanation →

839

MCQeasy

A company is designing a data storage solution for IoT device telemetry. Each device sends a JSON payload every second. The data must be stored in a way that supports real-time dashboards and long-term analytics with low latency. Which Azure data store should be used for the ingestion layer?

A.Azure SQL Database

B.Azure Blob Storage

C.Azure Event Hubs

D.Azure Data Lake Storage

AnswerC

Event Hubs is designed for high-throughput data ingestion from IoT devices.

Why this answer

Azure Event Hubs is the correct choice for the ingestion layer because it is a fully managed, real-time data streaming platform designed to ingest millions of events per second with low latency. It supports the capture of JSON telemetry from IoT devices and integrates directly with downstream analytics services like Azure Stream Analytics for real-time dashboards and long-term storage in Azure Data Lake or Blob Storage. Its partitioned throughput model ensures scalable, durable ingestion without blocking producers.

Exam trap

The trap here is that candidates confuse the ingestion layer with the storage layer, choosing Azure Blob Storage or Data Lake Storage because they think 'store data' means persistent storage, but the question specifically asks for the ingestion layer where real-time, low-latency streaming is required, which Event Hubs uniquely provides.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational OLTP store optimized for structured queries and ACID transactions, not for high-velocity, schema-less JSON ingestion at millions of events per second, and it would introduce latency and cost bottlenecks. Option B is wrong because Azure Blob Storage is an object store designed for batch and large-file storage, not for real-time, per-second event ingestion; it lacks native streaming ingestion, pub-sub semantics, and sub-second latency for dashboards. Option D is wrong because Azure Data Lake Storage is a hierarchical file system optimized for analytics on large datasets, not for real-time event ingestion; it is typically used as a destination for data after it has been processed or captured from a streaming source like Event Hubs.

Full explanation →

840

Multi-Selecthard

Which THREE metrics from Azure Monitor should be used to diagnose performance bottlenecks in an Azure Data Factory pipeline?

Select 3 answers

A.Pipeline Succeeded Rerun Count

B.Blob Capacity

C.Activity Duration

D.SQL Pool DWU Used

E.Data Integration Unit (DIU) Consumption

AnswersA, C, E

High rerun count indicates failures and potential bottlenecks.

Why this answer

Pipeline Succeeded Rerun Count (A) is correct because a high number of reruns indicates that the pipeline is repeatedly failing and retrying, which directly points to a performance bottleneck such as resource contention or throttling. This metric helps identify pipelines that are not completing successfully on the first attempt, signaling underlying issues that degrade throughput.

Exam trap

The trap here is that candidates often confuse storage-level metrics (like Blob Capacity) or data warehouse metrics (like DWU Used) with pipeline-specific performance indicators, but the question explicitly asks for metrics that diagnose bottlenecks in the pipeline execution itself, not in downstream storage or compute services.

Full explanation →

841

MCQmedium

You are building a streaming pipeline in Azure Synapse Analytics to ingest real-time sensor data from IoT devices. The data must be processed with a 2-second latency and stored in a dedicated SQL pool for reporting. The source emits JSON messages with a nested structure. Which approach should you use to ingest and transform the data?

A.Use Azure Synapse Spark with structured streaming to read from Event Hubs, flatten the JSON using Spark SQL, and write to the dedicated SQL pool.

B.Use Azure Stream Analytics to ingest data from Azure Event Hubs, apply a query to flatten the JSON, and output directly to the dedicated SQL pool.

C.Use Azure Data Factory to run a tumbling window trigger that reads from Event Hubs every 2 seconds and copies data to the dedicated SQL pool.

D.Use Azure Databricks with Auto Loader to ingest data from Event Hubs and write to the dedicated SQL pool.

AnswerB

Stream Analytics is designed for real-time processing, supports nested JSON via WITH clause, and can output to Synapse SQL pool with low latency.

Why this answer

Option B is correct because Azure Stream Analytics can ingest from Event Hubs, flatten JSON in real-time, and write to Synapse SQL pool via the built-in output adapter. Option A (Databricks Auto Loader) is for batch/streaming but not optimal for sub-2-second latency and nested JSON flattening without additional code. Option C (Spark structured streaming) is possible but more complex.

Option D (Azure Data Factory) is for batch/orchestration, not real-time streaming.

Full explanation →

842

MCQhard

You are a data engineer for a large e-commerce company. You have an Azure Synapse Analytics dedicated SQL pool that stores transactional data. The pool is currently at DWU1000c. You have a critical dashboard that runs a complex query every 5 minutes. The query scans a large fact table partitioned by date. The query performance is degrading over time as data accumulates. You need to improve performance without increasing DWUs or changing the dashboard query. You also need to minimize data movement overhead. You have the following options: A. Create a columnstore index on the fact table with a partition alignment. B. Create a materialized view that aggregates the data at the partition level. C. Implement result-set caching and set the cache to expire every 5 minutes. D. Redistribute the fact table using hash distribution on the date column. Which option should you choose?

A.Redistribute the fact table using hash distribution on the date column.

B.Create a materialized view that aggregates the data at the partition level.

C.Create a columnstore index on the fact table with a partition alignment.

D.Implement result-set caching and set the cache to expire every 5 minutes.

AnswerD

Result-set caching stores query results and can serve repeated queries quickly.

Why this answer

Option C is correct because result-set caching stores the exact query results and can serve the dashboard query instantly if the underlying data has not changed. Since the query runs every 5 minutes, setting the cache expiration to 5 minutes ensures fresh data. Option A is wrong because the table likely already has a columnstore index (default in Synapse).

Option B is wrong because materialized views require maintenance and may not match the exact query. Option D is wrong because hash distribution on date can cause data skew and does not reduce scan overhead as effectively as caching.

Full explanation →

843

MCQeasy

You need to transform data in Azure Synapse Analytics using a language that supports procedural logic and error handling. Which option should you use?

A.T-SQL stored procedures

B.CREATE VIEW

C.PolyBase

D.CREATE EXTERNAL TABLE

AnswerA

Supports procedural logic and error handling.

Why this answer

T-SQL stored procedures are the correct choice because they support procedural logic (e.g., IF/ELSE, loops, TRY/CATCH) and error handling within Azure Synapse Analytics dedicated SQL pools. This allows you to encapsulate complex data transformation logic, handle runtime errors gracefully, and manage transactions, which is not possible with declarative objects like views or external tables.

Exam trap

The trap here is that candidates confuse PolyBase's ability to query external data with the ability to perform procedural transformations, overlooking that PolyBase is a query engine, not a programming construct for logic and error handling.

How to eliminate wrong answers

Option B is wrong because CREATE VIEW creates a read-only virtual table that cannot contain procedural logic or error handling; it is purely declarative. Option C is wrong because PolyBase is a data virtualization technology for querying external data sources (e.g., Azure Blob Storage) using T-SQL, but it does not support procedural logic or error handling itself. Option D is wrong because CREATE EXTERNAL TABLE defines a schema for external data but provides no procedural capabilities or error handling; it is a metadata object for PolyBase queries.

Full explanation →

844

MCQeasy

A company ingests streaming data from IoT devices into Azure Event Hubs. The data must be processed in near real-time to detect anomalies and stored in Azure Data Lake Storage Gen2 for historical analysis. The solution must minimize latency and avoid duplicate processing. Which Azure service should be used for processing?

A.Azure Data Factory

B.Azure Databricks with Structured Streaming

C.Azure Functions with Event Hubs trigger

D.Azure Stream Analytics

AnswerD

Azure Stream Analytics provides low-latency stream processing with exactly-once semantics and integrates with Event Hubs and Data Lake Storage.

Why this answer

Azure Stream Analytics is the correct choice because it is purpose-built for near real-time stream processing with sub-second latency, directly integrates with Event Hubs as input and Data Lake Storage Gen2 as output, and provides built-in exactly-once delivery semantics to avoid duplicate processing. It also supports temporal windowing and anomaly detection functions natively, making it ideal for this IoT anomaly detection scenario.

Exam trap

The trap here is that candidates often choose Azure Databricks with Structured Streaming because of its flexibility and popularity, but they overlook the specific requirement for minimal latency and built-in exactly-once processing, which Azure Stream Analytics handles more efficiently without the overhead of a Spark cluster.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a batch-oriented ETL orchestration service, not designed for near real-time streaming or sub-second latency processing. Option B is wrong because Azure Databricks with Structured Streaming introduces higher latency due to Spark job initialization and micro-batch processing, and requires additional configuration for exactly-once semantics, making it less optimal for minimal latency and duplicate avoidance. Option C is wrong because Azure Functions with Event Hubs trigger processes events one at a time or in small batches, lacks native windowing and anomaly detection operators, and can lead to duplicate processing if not carefully managed with checkpointing and idempotent logic.

Full explanation →

845

MCQhard

You have an Azure Synapse Analytics workspace with Apache Spark pools. You need to monitor Spark application performance and identify stages that are taking the longest time. Which tool should you use?

A.Use the Spark UI available in Synapse Studio.

B.Run KQL queries in Log Analytics against Spark logs.

C.Query Azure Monitor metrics for the Spark pool.

D.Use the Synapse Pipeline monitoring view.

AnswerA

Spark UI provides detailed stage-level performance metrics.

Why this answer

Option A is correct because the Spark UI provides detailed information about stages, tasks, and executors. Option B is wrong because Azure Monitor metrics provide aggregate metrics but not stage-level details. Option C is wrong because Log Analytics queries can analyze logs but not as directly as Spark UI.

Option D is wrong because Synapse Studio provides a job view but not as granular as Spark UI.

Full explanation →

846

MCQeasy

You need to audit all data access to an Azure Storage account. Which Azure service should you enable?

A.Azure Storage analytics logs and send to Log Analytics workspace

B.Azure Policy to audit storage account access

C.Azure Monitor metrics

D.Azure Security Center

AnswerA

Storage logs capture access details; Log Analytics enables querying.

Why this answer

Azure Storage analytics logs capture detailed information about successful and failed requests to a storage account, including authentication details, IP addresses, and operation types. By sending these logs to a Log Analytics workspace, you can query and analyze them using KQL, enabling comprehensive auditing of all data access. This is the correct service for auditing because it provides the granular, queryable logs required for security and compliance audits.

Exam trap

The trap here is that candidates confuse Azure Policy (which audits resource configurations) with actual data access auditing, or assume Azure Monitor metrics provide sufficient detail, when only Storage analytics logs sent to Log Analytics offer the per-request, queryable audit trail required.

How to eliminate wrong answers

Option B is wrong because Azure Policy is used to enforce compliance rules on resource configurations (e.g., requiring HTTPS), not to audit individual data access events. Option C is wrong because Azure Monitor metrics provide aggregated performance and error counts (e.g., transactions, latency), not detailed per-request audit logs. Option D is wrong because Azure Security Center (now Microsoft Defender for Cloud) provides security recommendations and threat detection, but it does not natively capture or store granular data access logs for auditing purposes.

Full explanation →

Page 12 of 12

All pages

1 2 3 4 5 6 7 8 9 10 11 12

Practice DP-203 by domain

Target a specific domain to shore up weak areas.

Secure, monitor, and optimize data storage and data processing Design and develop data processing Design and implement data security Monitor and optimize data storage and processing Design and implement data storage Develop data processing

See all domains with question counts →