Microsoft · Free Practice Questions · Last reviewed May 2026
36real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.
Your organization uses Azure Synapse Analytics dedicated SQL pool. You need to ensure that all data at rest in the SQL pool is encrypted using a customer-managed key stored in Azure Key Vault. What should you configure?
Implement Always Encrypted with column encryption keys stored in Azure Key Vault.
Configure Dynamic Data Masking to obfuscate sensitive data.
Enable Azure Storage Service Encryption with a customer-managed key.
Enable Transparent Data Encryption (TDE) with a customer-managed key in Azure Key Vault.
TDE with customer-managed key provides encryption at rest for the entire database, meeting the requirement.
You are designing a data processing solution in Azure Databricks that uses Unity Catalog. The security team requires that all users authenticate using Microsoft Entra ID and that access to tables is governed by attribute-based access control (ABAC) using table tags. Which feature should you enable?
Column-level security masks.
Dynamic views with user context functions.
Row-level security filters.
Table tags with access control lists (ACLs) in Unity Catalog.
Unity Catalog supports ABAC using tags and ACLs, and integrates with Microsoft Entra ID for authentication.
Your team uses Azure Data Factory to copy data from on-premises SQL Server to Azure Blob Storage. You need to ensure that data in transit is encrypted using TLS 1.2. What should you configure?
Ensure the Azure Data Factory pipeline uses HTTPS for the copy activity.
HTTPS uses TLS to encrypt data in transit, and Azure Data Factory supports this by default.
Set up a site-to-site VPN between the on-premises network and Azure.
Configure the on-premises SQL Server to use SSL certificates.
Use Azure ExpressRoute with private peering.
You have an Azure Databricks workspace that uses a managed resource group. The security team requires that all cluster nodes use no public IP addresses and that all outbound traffic goes through a firewall. What should you configure?
Configure service endpoints for Azure Storage and Azure Data Lake Storage.
Deploy the workspace in a VNet with forced tunneling enabled and a firewall.
VNet injection with forced tunneling ensures cluster nodes have no public IPs and all outbound traffic goes through the firewall.
Apply network security groups (NSGs) to the subnet that restrict outbound traffic.
Enable Azure Private Link for the Databricks workspace.
Refer to the exhibit. You are creating an Azure Storage account using an ARM template with the above snippet. After deployment, a security auditor reviews the configuration and notes that the storage account is not using a customer-managed key for encryption. What is the most likely reason?
The 'keyVersion' is missing a specific version, so Azure Storage defaults to Microsoft-managed key.
For customer-managed key, a specific key version is required; an empty version may cause Azure to use the latest but if the key is not accessible, it falls back to Microsoft-managed key.
The 'keySource' should be 'Microsoft.Storage' for customer-managed key.
The storage account requires double encryption to use customer-managed key.
The 'infrastructureEncryption' setting is enabled, which overrides customer-managed key.
You are monitoring an Azure Synapse Analytics dedicated SQL pool and notice that some queries are taking longer than expected. You need to identify queries that are experiencing significant memory pressure. Which dynamic management view (DMV) should you query?
sys.dm_pdw_exec_requests
This DMV includes memory_grant and memory_used columns to assess memory pressure.
sys.dm_pdw_wait_stats
sys.dm_pdw_query_stats_xe
sys.dm_pdw_nodes_os_performance_counters
Want more Secure, monitor, and optimize data storage and data processing practice?
Practice this domainA company uses Azure Synapse Analytics to process large datasets. They need to transform JSON data stored in Azure Data Lake Storage Gen2 into a star schema. Which data processing approach minimizes data movement and leverages the compute closest to the data?
Use Azure Data Factory to copy the JSON data into Azure SQL Database, then use T-SQL to transform.
Use Azure Data Factory with SSIS to transform and load into dedicated SQL pool.
Load data into a Spark DataFrame in Synapse notebooks, transform, and write back.
Create external tables on the JSON files using PolyBase, then use CREATE EXTERNAL TABLE AS SELECT (CETAS) to write transformed Parquet files.
Minimizes movement by querying in place.
You are designing a batch processing pipeline that reads CSV files from Azure Blob Storage, performs aggregations using Azure Databricks, and writes results to Azure Synapse Analytics. The pipeline must handle schema drift (new columns appearing in source files). Which approach should you recommend?
Use Azure Data Factory mapping data flows with schema drift enabled, mapping to a fixed sink schema.
Define a fixed schema in the source and ignore any new columns.
Use Spark with mergeSchema option when reading, and write using a Delta table to evolve schema automatically.
Handles schema drift automatically.
Use Azure Stream Analytics to pre-process and enforce schema.
A company is running a Spark job on Azure Databricks that processes 500 GB of data daily. The job frequently fails with 'OutOfMemoryError' during shuffles. The cluster uses 10 workers of type Standard_DS3_v2 (14 GB memory each). Which configuration change should you make to improve stability without over-provisioning?
Set spark.sql.shuffle.partitions to a higher value, e.g., 500.
Reduces data per partition, easing memory.
Increase the driver memory to 28 GB.
Increase the number of workers to 20.
Reduce spark.sql.shuffle.partitions to 100.
You need to design a near-real-time data processing solution that ingests IoT telemetry data from millions of devices. The data must be aggregated per minute and stored in Azure Cosmos DB for low-latency queries. Which Azure service combination should you use?
Azure Event Hubs -> Azure HDInsight (Kafka) -> Azure Cosmos DB
Azure Event Hubs -> Azure Stream Analytics -> Azure Cosmos DB
Stream Analytics provides near-real-time aggregation.
Azure IoT Hub -> Azure Databricks (Structured Streaming) -> Azure Cosmos DB
Azure Event Hubs -> Azure Data Factory -> Azure Cosmos DB
A data processing job in Azure Synapse Analytics writes results to a table in the dedicated SQL pool. After a failure, the job restarts from the beginning, causing duplicates. Which design pattern should you implement to ensure idempotent writes?
Use a TRUNCATE statement before each insert.
Use a MERGE statement with a unique key to upsert data.
Use a staging table and then swap partitions with the target table.
Atomic swap ensures idempotency.
Use CREATE TABLE AS SELECT (CTAS) with a unique constraint.
You are designing a data processing solution in Azure that must handle both batch and streaming data. The solution should use a common storage layer for both and support schema evolution. Which TWO technologies should you recommend?
Azure Event Hubs
Common ingestion for batch and streaming.
Azure SQL Database
Apache Kafka on HDInsight
Delta Lake (on Azure Databricks)
Supports batch and streaming, schema evolution.
Azure Data Lake Storage Gen2
Want more Design and develop data processing practice?
Practice this domainA company uses Azure Synapse Analytics dedicated SQL pool. They need to ensure that only users with a specific Azure AD group can query a particular schema. Which approach should they use?
Configure a server-level firewall rule to block other users.
Use the GRANT statement to grant SELECT on the schema to the Azure AD group.
GRANT schema permission controls access at schema level.
Create a row-level security policy on all tables in the schema.
Apply dynamic data masking to the schema.
You are designing data security for Azure Data Lake Storage Gen2. The requirement is to prevent data from being accessed by anyone outside the corporate network. Which feature should you enable?
Use Azure Private Endpoint or service endpoint with a VNet.
Private endpoint ensures data is accessed only from within the VNet.
Assign RBAC roles to deny access to all except corporate users.
Configure IP firewall rules to allow only corporate IP ranges.
Enable encryption at rest using customer-managed keys.
A company uses Azure Data Factory to copy sensitive data from on-premises SQL Server to Azure Blob Storage. They must ensure that data is encrypted in transit and at rest. Which combination of features should they use?
Use Always Encrypted in SQL Server and customer-managed keys in Blob Storage.
Set up a VPN between on-premises and Azure, and use Azure Disk Encryption.
Configure the copy activity to use TLS and enable Azure Storage Service Encryption.
TLS encrypts data in transit; Storage Service Encryption encrypts at rest automatically.
Use HTTPS for the copy activity and enable Azure Storage Service Encryption.
You need to audit all data access to an Azure Storage account. Which Azure service should you enable?
Azure Storage analytics logs and send to Log Analytics workspace
Storage logs capture access details; Log Analytics enables querying.
Azure Policy to audit storage account access
Azure Monitor metrics
Azure Security Center
A company uses Azure Key Vault to store secrets for data pipelines. They need to rotate the secrets automatically every 90 days. What should they implement?
Use Azure Policy to enforce secret expiration.
Assign RBAC roles to a service principal to update the secret.
Create a manual process to update the secret in Key Vault.
Configure Key Vault secret rotation with an expiration date of 90 days.
Key Vault can automatically rotate secrets based on expiration.
Which TWO of the following are valid methods to secure data at rest in Azure Data Lake Storage Gen2?
Assign RBAC roles for data access
Configure storage firewall rules
Use customer-managed keys in Azure Key Vault
Customer-managed keys provide additional control over encryption.
Use Azure Storage Service Encryption (SSE)
SSE encrypts data at rest automatically.
Enable TLS 1.2 for all connections
Want more Design and implement data security practice?
Practice this domainA company runs a mission-critical Azure Data Factory pipeline that ingests data every hour from Azure Blob Storage into Azure Synapse Dedicated SQL Pool. Recently, the pipeline has been failing with timeout errors during the copy activity. The source blob files are around 500 MB each. Which configuration change would MOST effectively reduce the likelihood of timeout errors?
Decrease the 'Batch size' for the copy activity.
Change the sink to use PolyBase with staging enabled.
Increase the Data Integration Unit (DIU) to 8.
Enable 'Enable staging' and set 'Degree of copy parallelism' to a higher value.
Increases parallelism, reducing copy time and timeout likelihood.
You are designing a data processing solution using Azure Databricks with Delta Lake. The data is partitioned by date and ingested daily. You notice that the Delta table has many small files, causing slow read performance. Which strategy should you recommend to optimize the table for faster queries?
Run OPTIMIZE on the table to compact small files.
OPTIMIZE merges small files into larger ones.
Run ZORDER BY on the date column.
Run VACUUM to delete old files.
Increase the number of partitions by adding a new partition column.
A data engineer monitors an Azure Stream Analytics job that processes real-time data. The job is falling behind, and the SU utilization is at 100%. Which action should be taken to improve performance?
Increase the number of Streaming Units (SU).
More SU provides more processing power.
Reduce the number of Streaming Units.
Change the query compatibility level to 1.0.
Deploy a second Stream Analytics job and split the input.
You have an Azure Data Lake Storage Gen2 account that stores large volumes of parquet files. A reporting application frequently queries a specific subset of data filtered by a 'region' column. To minimize query latency and cost, which optimization should you implement?
Partition the data by region in the folder structure.
Partition elimination reduces data scanned.
Create a clustered index on the region column.
Compress the parquet files using gzip.
Enable hierarchical namespace on the storage account.
A company uses Azure Data Lake Storage Gen2 with Azure Databricks. They notice that the job to write data into Delta Lake tables takes too long. The data is coming from a streaming source with a high velocity of small writes. Which approach should be taken to optimize write performance?
Configure the streaming to write in micro-batches with a higher trigger interval.
Batching reduces the number of small file writes.
Increase the cluster size to 16 nodes.
Enable 'auto optimize' and 'optimized writes' on the Delta table.
Change the output format from Delta to Parquet.
Which TWO actions should you take to reduce costs associated with an Azure Synapse Dedicated SQL Pool that is used for reporting during business hours only?
Pause the pool during non-business hours.
Stops compute billing when not in use.
Enable advanced data compression on all tables.
Scale down the pool during business hours.
Change the distribution of large tables to ROUND_ROBIN.
Implement result set caching for frequently run queries.
Reduces compute usage for repeated queries.
Want more Monitor and optimize data storage and processing practice?
Practice this domainA company is designing a data lake solution on Azure Data Lake Storage Gen2. Data will be ingested from IoT devices at high frequency (every 5 seconds). Each device sends a JSON payload of 2 KB. The data must be stored in a hierarchical namespace and partitioned by date and device ID to optimize query performance. Which partition strategy should be used?
Use Azure SQL Database with clustered columnstore index on date and device ID.
Organize folders as /YYYY/MM/DD/DeviceID/ in ADLS Gen2 and use file naming that includes timestamp.
This folder structure enables efficient partition pruning based on date and device ID.
Use Azure Table Storage with PartitionKey set to date and RowKey set to device ID.
Use Azure Cosmos DB with partition key on (date, device ID) and TTL for data retention.
You are designing a near-real-time analytics pipeline for a retail company. Transaction data is generated in Azure SQL Database and must be replicated to Azure Synapse Analytics (dedicated SQL pool) with less than 5 minutes latency. The source table has 50 million rows and 200 columns, but only 30 columns are needed for analytics. Which approach should you recommend?
Use Azure SQL Database Change Tracking and push changes to Azure Event Hubs, then use Azure Stream Analytics to write to Synapse.
Enable Change Data Capture (CDC) on the source table and use Azure Data Factory with a 1-minute tumbling window to copy changes into Synapse.
CDC captures only changed rows, and ADF can run frequently to meet latency target.
Use Azure Synapse PolyBase to directly query the source SQL database every 5 minutes.
Schedule a full copy of the entire table every 5 minutes using Azure Data Factory.
A data engineer needs to store semi-structured JSON log files from a web application. Each log entry is about 1 KB. The logs are rarely queried (once a month) and must be retained for 7 years for compliance. The solution must minimize storage cost. Which storage option should be used?
Store the logs in Azure SQL Database as a table.
Store the logs in Azure Files share.
Store the logs in Azure Blob Storage with cool access tier.
Blob Storage cool tier is low-cost for infrequent access, suitable for logs.
Store the logs in Azure Cosmos DB with a JSON container.
You are designing a solution to store streaming data from multiple sources into Azure Data Lake Storage Gen2. The data must be organized by ingestion time and source system. Each source system produces data in a different format: CSV, JSON, and Parquet. The solution must allow efficient querying using Azure Synapse Serverless SQL and must support partitioning on ingestion date. What is the recommended folder structure?
/data/{date}/{source_system}/ (e.g., /data/2023-01-01/SourceA/)
/data/{source_system}/{date}/ (e.g., /data/SourceA/2023-01-01/)
This structure separates sources and dates, enabling efficient query pruning.
/data/{source_system}/ with files named {timestamp}.csv/.json/.parquet
/data/{source_system}/{year}/{month}/{day}/ (e.g., /data/SourceA/2023/01/01/)
A healthcare company stores sensitive patient data in Azure Data Lake Storage Gen2. They need to ensure that only authorized users can access data and that all access is audited. They also need to prevent data from being accessed by unauthorized Azure services. Which combination of security features should be used?
Use a private endpoint and Azure AD authentication, disable public access.
Use Azure RBAC and ACLs for authorization, enable firewall and virtual network service endpoints, and enable diagnostic settings for auditing.
Combination provides layered security and full audit.
Use managed identity for service access and disable public access.
Use Azure AD authentication and SAS tokens for access, enabling diagnostic logs for auditing.
Which TWO of the following are supported storage options for use as a source in Azure Synapse Pipeline Copy Activity?
Azure Data Lake Storage Gen2
ADLS Gen2 is a supported source.
Azure Analysis Services
Azure Cognitive Search
Azure Purview
Azure Blob Storage
Azure Blob Storage is a supported source.
Want more Design and implement data storage practice?
Practice this domainYou are designing a data processing pipeline in Azure Synapse Analytics that ingests streaming data from Azure Event Hubs and stores it in a dedicated SQL pool. The data volume is approximately 500 GB per hour with peak spikes. The pipeline must minimize data loss during transient failures. Which feature should you implement?
Use Azure Synapse Pipeline with Auto-commit and checkpointing to process streaming data.
Auto-commit with checkpointing in Synapse Pipeline provides fault tolerance and exactly-once processing for streaming data.
Use PolyBase to load data directly from Event Hubs to the dedicated SQL pool.
Use COPY INTO statement to ingest data from Event Hubs into the dedicated SQL pool.
Enable Event Hubs Capture to write data to Azure Data Lake Storage and then load using PolyBase.
You are designing a batch processing solution using Azure Databricks. The data source is a large Parquet dataset stored in Azure Data Lake Storage Gen2 (ADLS Gen2). The processing requires joining two datasets: one with 10 billion rows and another with 1 million rows. The cluster uses Photon runtime. Which optimization should you apply to minimize shuffle?
Broadcast the smaller table (1 million rows) to all worker nodes.
Broadcasting the smaller table avoids shuffling the large table, significantly reducing data movement.
Increase the cluster size to reduce shuffle overhead.
Create bucketed tables on the join key for both datasets.
Use Delta Lake and optimize file layout with OPTIMIZE command.
You are running a Spark job in Azure Synapse Analytics that reads from a Delta Lake table and performs multiple transformations. The job fails with an out-of-memory error on the executors. Which action should you take first to resolve the issue?
Enable checkpointing to truncate the lineage.
Decrease the number of partitions to reduce overhead.
Increase the executor memory setting in the Spark configuration.
Increasing executor memory provides more heap space to avoid OOM errors.
Use the cache() action on intermediate DataFrames.
You are designing a data pipeline in Azure Data Factory (ADF) that copies data from an on-premises SQL Server database to Azure Synapse Analytics dedicated SQL pool. The pipeline must run daily and handle incremental loads efficiently. Which sink dataset type and copy method should you use?
Use Azure Synapse Analytics dedicated SQL pool as the sink dataset and use the Copy activity with PolyBase enabled.
Use Azure Synapse Analytics dedicated SQL pool as the sink dataset and enable the built-in Upsert option.
Use Azure Blob Storage as the sink dataset, then use PolyBase to load into the dedicated SQL pool.
Use Azure Synapse Analytics dedicated SQL pool as the sink dataset and use Stored Procedure with staging table and PolyBase.
This combination enables high-throughput ingestion and supports incremental loading via merge logic in the stored procedure.
You are implementing a streaming solution using Azure Stream Analytics. The input is from an IoT Hub receiving telemetry from thousands of devices. The output is to Azure Synapse Analytics dedicated SQL pool. The requirement is to compute rolling averages over a 5-minute tumbling window and write results every minute. Which windowing function and output configuration should you use?
Use a TumblingWindow with duration of 5 minutes and output every 5 minutes.
Use a SlidingWindow with duration 5 minutes and output every 1 minute.
Use a HoppingWindow with size 5 minutes and hop 1 minute.
Hopping windows with a 1-minute hop produce results every minute, each covering the last 5 minutes.
Use a SessionWindow with timeout 5 minutes and maximum duration 10 minutes.
You are optimizing a Spark DataFrame transformation in Azure Synapse Analytics. The DataFrame has 20 columns and 100 million rows. You notice that the job is slow due to many small files being written to the output. Which two actions can you take to reduce the number of output files? (Choose two.)
Use coalesce() to reduce the number of partitions without a shuffle.
Coalesce reduces partitions and thus output files, minimizing shuffle.
Enable caching on the DataFrame before writing.
Apply bucketing on a column to group data.
Increase the number of partitions using repartition() with a larger number.
Use repartition() with a smaller number of partitions.
Fewer partitions mean fewer files written.
Want more Develop data processing practice?
Practice this domainThe DP-203 exam has 50 questions and must be completed in 120 minutes. The passing score is 700/1000.
Scenario-based questions covering exam objectives with detailed answer explanations.
The exam covers 6 domains: Secure, monitor, and optimize data storage and data processing, Design and develop data processing, Design and implement data security, Monitor and optimize data storage and processing, Design and implement data storage, Develop data processing. Questions are weighted by domain — higher-weight domains appear more on your actual exam.
No. These are original exam-style practice questions written against the official Microsoft DP-203 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.
Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.