Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

← Design and develop data processing practice sets

DP-203 Design and develop data processing • Complete Question Bank

DP-203 Design and develop data processing — All Questions With Answers

Complete DP-203 Design and develop data processing question bank — all 0 questions with answers and detailed explanations.

42
Questions
Free
No signup
Certifications/DP-203/Practice Test/Design and develop data processing/All Questions
Question 1mediummultiple choice
Read the full Design and develop data processing explanation →

A company uses Azure Synapse Analytics to process large datasets. They need to transform JSON data stored in Azure Data Lake Storage Gen2 into a star schema. Which data processing approach minimizes data movement and leverages the compute closest to the data?

Question 2easymultiple choice
Read the full Design and develop data processing explanation →

You are designing a batch processing pipeline that reads CSV files from Azure Blob Storage, performs aggregations using Azure Databricks, and writes results to Azure Synapse Analytics. The pipeline must handle schema drift (new columns appearing in source files). Which approach should you recommend?

Question 3hardmultiple choice
Read the full Design and develop data processing explanation →

A company is running a Spark job on Azure Databricks that processes 500 GB of data daily. The job frequently fails with 'OutOfMemoryError' during shuffles. The cluster uses 10 workers of type Standard_DS3_v2 (14 GB memory each). Which configuration change should you make to improve stability without over-provisioning?

Question 4mediummultiple choice
Read the full network assurance explanation →

You need to design a near-real-time data processing solution that ingests IoT telemetry data from millions of devices. The data must be aggregated per minute and stored in Azure Cosmos DB for low-latency queries. Which Azure service combination should you use?

Question 5easymultiple choice
Read the full NAT/PAT explanation →

A data processing job in Azure Synapse Analytics writes results to a table in the dedicated SQL pool. After a failure, the job restarts from the beginning, causing duplicates. Which design pattern should you implement to ensure idempotent writes?

Question 6mediummulti select
Read the full Design and develop data processing explanation →

You are designing a data processing solution in Azure that must handle both batch and streaming data. The solution should use a common storage layer for both and support schema evolution. Which TWO technologies should you recommend?

Question 7hardmulti select
Read the full Design and develop data processing explanation →

A company uses Azure Databricks to process streaming data from Event Hubs. The data is written to a Delta table. The job occasionally fails due to checkpoint corruption. Which THREE measures should you implement to improve reliability?

Question 8easymultiple choice
Read the full Design and develop data processing explanation →

A company ingests streaming data from IoT devices into Azure Event Hubs. The data must be processed in near real-time to detect anomalies and stored in Azure Data Lake Storage Gen2 for historical analysis. The solution must minimize latency and avoid duplicate processing. Which Azure service should be used for processing?

Question 9mediummultiple choice
Read the full Design and develop data processing explanation →

A data engineer is designing a batch processing pipeline that reads data from Azure Blob Storage, transforms it using Azure Databricks, and writes the output to Azure Synapse Analytics. The source files are in CSV format and arrive daily at 02:00 UTC. The transformation must be idempotent and the pipeline should handle late-arriving data (up to 2 hours). What is the best approach to trigger the pipeline?

Question 10hardmultiple choice
Read the full NAT/PAT explanation →

A multinational corporation uses Azure Data Lake Storage Gen2 to store petabytes of parquet files partitioned by date and hour. Data scientists report that queries on the last 7 days of data take over 30 minutes, while queries on older data are fast. The storage account uses the default Azure Blob Storage hierarchical namespace. Which action will MOST improve query performance on recent data?

Question 11mediummultiple choice
Read the full Design and develop data processing explanation →

A company uses Azure Data Factory to orchestrate an ETL pipeline that copies data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline runs hourly and uses a self-hosted integration runtime. Recently, the pipeline started failing with timeout errors. The on-premises SQL Server is healthy and the network is stable. What is the most likely cause and solution?

Question 12easymultiple choice
Read the full Design and develop data processing explanation →

A data engineer needs to process a large dataset stored in Azure Blob Storage using Azure Databricks. The dataset consists of millions of small CSV files. The processing job is slow due to the overhead of reading many small files. Which technique should be used to improve performance?

Question 13mediummulti select
Read the full Design and develop data processing explanation →

Which TWO actions are appropriate when designing a data processing solution that must meet strict SLAs for latency and throughput?

Question 14hardmulti select
Read the full Design and develop data processing explanation →

Which THREE factors should be considered when choosing between Azure Stream Analytics and Azure Databricks for a real-time data processing solution?

Question 15hardmultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. An Azure Data Factory instance uses a self-hosted integration runtime. The exhibit shows the properties of the integration runtime. The data engineer notices that copy activities are failing with errors indicating that the integration runtime is not available. What is the most likely cause?

Exhibit

Refer to the exhibit.

```json
{
  "identity": {
    "type": "SystemAssigned",
    "principalId": "12345678-1234-1234-1234-123456789012",
    "tenantId": "87654321-4321-4321-4321-210987654321"
  },
  "properties": {
    "provisioningState": "Succeeded",
    "integrationRuntime": {
      "type": "SelfHosted",
      "properties": {
        "typeProperties": {
          "selfContainedInteractiveAuthoringEnabled": true,
          "autoUpdate": true,
          "latestVersion": "5.25.8327.1",
          "pushedVersion": "5.25.8327.1",
          "version": "5.23.8123.0",
          "status": "Online"
        }
      }
    }
  }
}
```
Question 16hardmultiple choice
Read the full Design and develop data processing explanation →

You are a data engineer for a large e-commerce company. The company uses Azure Data Lake Storage Gen2 (ADLS Gen2) as its data lake. A team of data scientists needs to process a massive dataset (approximately 5 TB) stored in Parquet format in the data lake. The dataset contains sales transactions from the past 10 years. The data scientists run a Spark job daily using Azure Synapse Analytics (serverless Spark pool) to compute aggregated sales metrics by product category and region. The job reads the entire dataset each day, performs transformations, and writes the aggregated results back to the data lake. Over the past few weeks, the job has been taking longer to complete, and the data scientists have reported that the job now takes over 6 hours, exceeding the acceptable SLA of 4 hours. They suspect the issue is related to data skew or suboptimal partitioning. You need to optimize the job to reduce execution time. Which approach should you take?

Question 17mediummultiple choice
Read the full NAT/PAT explanation →

You are a data engineer at a healthcare analytics company. The company uses Azure Data Factory (ADF) to orchestrate data pipelines that ingest patient data from on-premises SQL Server databases into Azure Synapse Analytics. Recently, the pipeline has been failing intermittently with the following error: 'Failure happened on 'Sink' side. ErrorCode=SqlFailedToConnect, Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, Message=Cannot connect to SQL Server Database. The TCP connection to the host <server_name>, port 1433 has failed. Error: 'Connection timed out.'.' The on-premises SQL Server is behind a corporate firewall. The ADF self-hosted integration runtime (SHIR) is installed on a VM inside the corporate network. You have verified that the SHIR is running and that the SQL Server is accessible from the SHIR VM using SQL Server Management Studio (SSMS). The error occurs sporadically, not consistently. What is the most likely cause of the intermittent connection timeout?

Question 18mediummulti select
Read the full Design and develop data processing explanation →

A data engineering team is designing a batch processing solution using Azure Databricks. The data is stored in Azure Data Lake Storage Gen2 (ADLS Gen2) and must be processed daily with minimal cost. The team needs to choose between using a Delta Lake table or a Parquet file format for the processed output. Which TWO factors should the team consider when making this decision?

Question 19easymultiple choice
Read the full Design and develop data processing explanation →

You are a data engineer at a retail company. You have designed a near real-time data processing solution using Azure Stream Analytics. The input is from Azure Event Hubs, which receives clickstream events from the company's e-commerce website. The output is written to an Azure SQL Database table for reporting. Each event includes fields: UserId, ProductId, EventType (e.g., 'click', 'purchase'), and Timestamp. The requirement is to calculate the number of purchases per product in a 5-minute tumbling window and update a SQL table. The Stream Analytics job has been running for a week, but the reporting team notices that the purchase counts in SQL are consistently lower than expected compared to a direct count from Event Hubs. You suspect that late-arriving events are being dropped. The job's configuration includes a 5-minute tumbling window with no late arrival policy. What should you do to fix the issue without losing data?

Question 20mediummulti select
Read the full Design and develop data processing explanation →

You are designing an Azure Stream Analytics job to process real-time IoT data from thousands of devices. The job must handle late-arriving events (up to 1 hour late) and out-of-order events (up to 5 minutes). Which two temporal policies should you configure?

Question 21hardmultiple choice
Read the full NAT/PAT explanation →

You are designing a data processing solution for a retail company. The solution must ingest streaming sales data from point-of-sale (POS) systems and batch uploads from stores that are offline. The total data volume is 5 TB daily. The solution must allow real-time dashboards and periodic batch processing. Which combination of services and ingestion patterns is most cost-effective and scalable?

Question 22easymultiple choice
Read the full Design and develop data processing explanation →

You are implementing a data pipeline using Azure Data Factory. The source is an on-premises SQL Server database. Which Azure Data Factory component is required to connect to the on-premises data source?

Question 23mediummultiple choice
Read the full NAT/PAT explanation →

You have a Synapse Analytics dedicated SQL pool. You need to load 100 GB of CSV data from Azure Data Lake Storage Gen2 into a fact table. The table has a hash-distributed column. Which pattern is most efficient for loading with minimal impact on concurrent queries?

Question 24hardmulti select
Read the full Design and develop data processing explanation →

You are designing a stream processing solution using Azure Stream Analytics. The job must reference a static lookup table (product catalog) stored in Azure Blob Storage. The catalog is updated once daily. The job should automatically pick up the latest version without restarting. Which two configurations are required? (Choose two.)

Question 25mediummulti select
Read the full Design and develop data processing explanation →

You are designing a data transformation pipeline using Azure Databricks. The pipeline reads from Azure Data Lake Storage Gen2, performs aggregations, and writes to a Synapse dedicated SQL pool. Which three configurations should you implement to optimize performance and minimize cost? (Choose three.)

Question 26easymultiple choice
Read the full NAT/PAT explanation →

Which Azure service is primarily used for orchestrating data pipelines in a cloud-native ETL workflow?

Question 27mediummultiple choice
Read the full Design and develop data processing explanation →

When designing a data processing solution using Azure Databricks, what is the recommended approach to handle schema evolution when reading data from Delta Lake tables?

Question 28hardmultiple choice
Read the full Design and develop data processing explanation →

You are building a streaming pipeline in Azure Stream Analytics that reads from an Azure Event Hubs input with 10 partitions. The query performs a GROUP BY on a column that is not the partition key. To ensure consistency, which partitioning scheme should you use?

Question 29easymulti select
Read the full Design and develop data processing explanation →

Which of the following are valid activities in an Azure Data Factory pipeline? (Choose two.)

Question 30mediummulti select
Read the full Design and develop data processing explanation →

You are designing a data processing solution that requires exactly-once processing semantics for streaming data. Which two Azure services support exactly-once processing? (Choose two.)

Question 31mediummultiple choice
Read the full Design and develop data processing explanation →

Match the Azure service to its primary data processing use case. Drag each service on the left to the correct use case on the right.

Services: Azure Databricks, Azure Stream Analytics, Azure Data Factory, Azure Synapse Analytics Use Cases: - Real-time event processing - Orchestration of ETL pipelines - Big data analytics with Spark - Enterprise data warehousing

Question 32mediumdrag order
Read the full Design and develop data processing explanation →

Drag and drop the steps to implement incremental data loading using Azure Data Factory into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 33mediumdrag order
Read the full Design and develop data processing explanation →

Drag and drop the steps to implement Slowly Changing Dimension (SCD) Type 2 in Azure Synapse Analytics dedicated SQL pool into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 34mediumdrag order
Read the full Design and develop data processing explanation →

Drag and drop the steps to set up Azure Data Factory pipeline with parameterization and dynamic expressions into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 35mediummatching
Read the full Design and develop data processing explanation →

Match each storage redundancy option to its description in Azure Storage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Three synchronous copies within a single data center

Three copies across multiple availability zones in a region

Geo-redundant storage with read access in secondary region

Geo-zone-redundant storage with read access in secondary region

Question 36mediummatching
Read the full Design and develop data processing explanation →

Match each Azure data integration tool to its typical use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Query external data in Azure Storage using T-SQL

High-throughput data ingestion into Synapse SQL

Orchestrate data movement and transformation

Complex data engineering with notebooks

Question 37mediummatching
Read the full Design and develop data processing explanation →

Match each Azure service tier to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Hierarchical namespace for Azure Data Lake Storage

Optimized for frequent data access

Optimized for infrequent access with lower cost

Lowest cost for rarely accessed data

Question 38hardmultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. A data engineer runs a Synapse Spark job that fails with the error shown. Which configuration change is most likely to resolve the issue?

Network Topology
workspace-name myworkspacespark-pool-name mypoolAzure CLI command output:"id": "job1","state": "error","errors": ["code": "LivyUnexpected","message": "java.lang.OutOfMemoryError: Java heap space"],"executorMemory": "2g","executorCores": 2,"numExecutors": 2},"id": "job2","state": "success","executorMemory": "4g","executorCores": 4,"numExecutors": 4
Question 39mediummultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. A data engineer notices that the target SQL table contains duplicate rows after a pipeline run. Which change to the pipeline configuration would prevent duplicates?

Exhibit

Azure Data Factory pipeline JSON snippet:

{
    "name": "CopyDataPipeline",
    "activities": [
        {
            "name": "CopyFromBlobToSQL",
            "type": "Copy",
            "inputs": [{"referenceName": "BlobDS", "type": "DatasetReference"}],
            "outputs": [{"referenceName": "SQLDS", "type": "DatasetReference"}],
            "typeProperties": {
                "source": {
                    "type": "BlobSource",
                    "recursive": true
                },
                "sink": {
                    "type": "SqlSink",
                    "writeBatchSize": 10000,
                    "preCopyScript": "TRUNCATE TABLE dbo.target"
                },
                "enableStaging": false,
                "translator": {
                    "type": "TabularTranslator",
                    "columnMappings": {
                        "Id": "Id",
                        "Name": "FullName"
                    }
                }
            }
        }
    ]
}
Question 40hardmultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. A Stream Analytics job shows increasing watermark delay and input deserialization errors. Which action should be taken first to troubleshoot?

Exhibit

Azure Stream Analytics job diagnostics log:

{
  "time": "2023-08-01T12:00:00Z",
  "properties": {
    "jobId": "job-123",
    "jobName": "IoTStreamJob",
    "events": [
      {
        "time": "2023-08-01T11:59:00Z",
        "type": "WatermarkDelay",
        "properties": {
          "watermarkDelaySeconds": 120,
          "maxWatermarkDelaySeconds": 300
        }
      },
      {
        "time": "2023-08-01T11:59:30Z",
        "type": "InputDeserializationError",
        "properties": {
          "source": "iothub",
          "count": 15
        }
      }
    ],
    "jobOutputWatermark": "2023-08-01T11:57:00Z"
  }
}
Question 41mediummultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. A user with Storage Blob Data Reader role on the container rawdata cannot list files under /2023/07/. What is the most likely reason?

Network Topology
az role assignment listassignee user@contoso.comscope /subscriptions/.../resourceGroups/rg1/providers/Microsoft.Storage/storageAccounts/storage1/blobServices/default/containers/rawdatafile-system rawdatapath /2023/07/account-name storage1auth-mode login"acl": "user::rwx","roleDefinitionName": "Storage Blob Data Reader","scope": "/subscriptions/.../containers/rawdata""owner": "$superuser","group": "$superuser"
Question 42hardmultiple choice
Read the full Design and develop data processing explanation →

Refer to the exhibit. A data engineer notices that Spark jobs on this cluster are running slower than expected. The cluster is using spot instances with fallback. Which factor is most likely causing the performance degradation?

Exhibit

Azure Databricks cluster configuration JSON:

{
  "cluster_name": "ETL Cluster",
  "spark_version": "10.4.x-scala2.12",
  "node_type_id": "Standard_DS3_v2",
  "autoscale": {
    "min_workers": 2,
    "max_workers": 8
  },
  "spark_conf": {
    "spark.sql.adaptive.enabled": "true",
    "spark.sql.adaptive.coalescePartitions.enabled": "true",
    "spark.sql.adaptive.advisoryPartitionSizeInBytes": "64MB"
  },
  "aws_attributes": {
    "first_on_demand": 1,
    "availability": "SPOT_WITH_FALLBACK",
    "zone_id": "us-west-2a"
  }
}

Practice tests

Scored 10-question sessions with instant feedback and explanations.

DP-203 Practice Test 1 — 10 Questions→DP-203 Practice Test 2 — 10 Questions→DP-203 Practice Test 3 — 10 Questions→DP-203 Practice Test 4 — 10 Questions→DP-203 Practice Test 5 — 10 Questions→DP-203 Practice Exam 1 — 20 Questions→DP-203 Practice Exam 2 — 20 Questions→DP-203 Practice Exam 3 — 20 Questions→DP-203 Practice Exam 4 — 20 Questions→Free DP-203 Practice Test 1 — 30 Questions→Free DP-203 Practice Test 2 — 30 Questions→Free DP-203 Practice Test 3 — 30 Questions→DP-203 Practice Questions 1 — 50 Questions→DP-203 Practice Questions 2 — 50 Questions→DP-203 Exam Simulation 1 — 100 Questions→

Practice by domain

Each domain maps to a weighted exam section. Focus on the domain where you are weakest.

Secure, monitor, and optimize data storage and data processingDesign and develop data processingDesign and implement data securityMonitor and optimize data storage and processingDesign and implement data storageDevelop data processing

Practice by scenario

Filter questions by type — troubleshooting, exhibit, drag-and-drop, PBQ, ACLs, OSPF, and more.

Browse scenarios→

Continue studying

All Design and develop data processing setsAll Design and develop data processing questionsDP-203 Practice Hub