How many Troubleshooting Scenario Questions questions are on this page?

This page has 14 Troubleshooting Scenario Questions scenario questions for the DP-203 exam, each with detailed explanations and wrong-answer analysis.

How should I approach DP-203 scenario questions?

Read the full scenario before looking at the answer options. Identify the constraint or requirement in the scenario, then eliminate options that are generally true but wrong for this specific case. Scenario questions reward careful reading over pattern matching.

← Back to Microsoft Azure Data Engineer Associate DP-203 questions

Scenario-based practice

Troubleshooting Scenario Questions

Practise Microsoft Azure Data Engineer Associate DP-203 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

Start full practice test Read exam guide

scenario questions

DP-203

exam code

Microsoft

vendor

Scenario guide

How to approach troubleshooting scenario questions

These questions describe a network symptom and ask you to identify the root cause or the correct fix. They appear across all certification exams and reward systematic thinking over memorisation. The best candidates follow a consistent troubleshooting framework even under time pressure.

Quick answer

Troubleshooting Scenario Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Practice scenarios

Question 1hardmultiple choice

Full question →

Refer to the exhibit. A Stream Analytics job shows increasing watermark delay and input deserialization errors. Which action should be taken first to troubleshoot?

Exhibit

Azure Stream Analytics job diagnostics log:

{
  "time": "2023-08-01T12:00:00Z",
  "properties": {
    "jobId": "job-123",
    "jobName": "IoTStreamJob",
    "events": [
      {
        "time": "2023-08-01T11:59:00Z",
        "type": "WatermarkDelay",
        "properties": {
          "watermarkDelaySeconds": 120,
          "maxWatermarkDelaySeconds": 300
        }
      },
      {
        "time": "2023-08-01T11:59:30Z",
        "type": "InputDeserializationError",
        "properties": {
          "source": "iothub",
          "count": 15
        }
      }
    ],
    "jobOutputWatermark": "2023-08-01T11:57:00Z"
  }
}

A
Check the input data schema and ensure it matches the query
Deserialization errors are often due to schema mismatch; fixing the data or query resolves the root cause.
B
Change the output to a different sink
Why wrong: Output sink is unrelated to input deserialization errors.
C
Increase the number of Streaming Units (SUs)
Why wrong: Scaling may help throughput but does not fix deserialization issues.
D
Set the watermark delay threshold higher
Why wrong: That only masks the symptom; the underlying issue remains.

Full breakdown with real-world context →

Question 2hardmultiple choice

Full question →

You are troubleshooting a slow-running pipeline in Azure Data Factory. The pipeline copies data from an on-premises SQL Server to Azure Synapse Analytics using a self-hosted integration runtime. The copy activity is using the 'Auto' copy method. You notice that network bandwidth is limited. Which configuration change would most likely improve performance?

A
Enable staging using Azure Blob Storage and use PolyBase to load into Synapse
Staging improves performance by using parallel uploads to Blob Storage.
B
Increase the Data Integration Units (DIU) for the copy activity
Why wrong: DIU applies to Azure IR, not self-hosted IR.
C
Change the copy method to 'Bulk insert'
Why wrong: Bulk insert is not available for on-premises sources.
D
Set the Fault Tolerance option to skip incompatible rows
Why wrong: This handles errors, not performance.

Full breakdown with real-world context →

Question 3hardmultiple choice

Full question →

You are troubleshooting a Synapse Spark notebook that fails when reading Parquet files from Azure Data Lake Storage Gen2. The error message indicates 'Permission denied'. The notebook uses a managed identity (System-assigned) for authentication. The Data Lake Storage account has a firewall enabled with 'Allow Azure services on the trusted services list' turned on. The storage account's RBAC role assignments include 'Storage Blob Data Contributor' for the managed identity. What is the most likely cause of the failure?

A
Parquet files require special permissions that are not granted by RBAC roles
Why wrong: RBAC roles cover data operations for Parquet files.
B
The managed identity has not been granted the 'Storage Blob Data Reader' role in addition to 'Storage Blob Data Contributor'
Why wrong: 'Storage Blob Data Contributor' includes read permissions, so this is not the issue.
C
The storage account firewall does not have a 'Resource instances' exception for the managed identity
Firewall rules require explicit addition of the managed identity as a resource instance to allow access when the firewall is enabled.
D
The notebook is using an incorrect connection string with account key
Why wrong: The notebook uses managed identity authentication, not account key.

Full breakdown with real-world context →

Question 4hardmultiple choice

Full question →

You are troubleshooting a data processing job in Azure Synapse Pipelines that fails intermittently with the error: 'Operation on target Sink failed: The request was aborted: Could not create SSL/TLS secure channel.' The pipeline reads from Azure Blob Storage and writes to an Azure SQL Database. The source and sink are in the same region. What is the most likely cause?

A
Azure SQL Database firewall rules blocking the IP address of the integration runtime.
Why wrong: Firewall blocks would give a different error message, like 'Cannot open server...'.
B
Transient network connectivity issues between the services.
Why wrong: Transient issues usually show timeout or connection reset errors, not SSL specific.
C
The Azure SQL Database DTU limit has been exceeded, causing throttling.
Why wrong: Throttling returns HTTP 429 or specific Azure SQL error codes.
D
The self-hosted integration runtime is using TLS 1.0, which is not supported by the services.
SSL/TLS handshake failure often stems from TLS version mismatch.

Full breakdown with real-world context →

Question 5hardmultiple choice

Full question →

You have a production pipeline in Azure Data Factory that copies data from an on-premises SQL Server to Azure Blob Storage using a self-hosted integration runtime. The pipeline fails intermittently with a 'Connection closed' error. The data volume is 50 GB per run. What should you first troubleshoot to resolve this issue?

A
Increase the memory and CPU resources on the self-hosted integration runtime machine and check network stability.
The self-hosted IR needs sufficient resources for large data transfers; 'Connection closed' often indicates resource exhaustion or network interruptions.
B
Increase the 'connection timeout' setting in the linked service to 30 minutes.
Why wrong: This addresses only idle connections, not active transfer failures due to resource limitations.
C
Change the copy activity to use staged copy with Azure Blob Storage as an intermediate store.
Why wrong: Staging can help with specific scenarios, but it adds latency and does not address the root cause of connection drops.
D
Disable fault tolerance in the copy activity to improve performance.
Why wrong: Disabling fault tolerance would cause the entire run to fail on any error, making the problem worse.

Full breakdown with real-world context →

Question 6hardmultiple choice

Full question →

You are troubleshooting a Synapse Pipeline that runs a Copy activity from an on-premises SQL Server to Azure Synapse Dedicated SQL Pool. The pipeline fails with the error: 'Failure happened on 'Source' side. ErrorCode=SqlOperationFailed.' The on-premises SQL Server has no firewall restrictions. What is the most likely cause?

A
Staging is not enabled for the Copy activity.
Why wrong: Staging is optional and not required for basic copy operations; its absence would not cause a source-side error.
B
The destination table in Synapse has a different schema.
Why wrong: A schema mismatch would fail on the sink side, not the source side.
C
The SQL Server credentials in the linked service are incorrect.
Why wrong: Incorrect credentials would cause a login failure, but the error is generic 'SqlOperationFailed' and could be connectivity-related.
D
The self-hosted integration runtime is not configured properly.
A self-hosted integration runtime is required to connect from Azure to on-premises networks; misconfiguration is a common cause of source-side failures.

Full breakdown with real-world context →

Question 7mediummulti select

Full question →

You are designing a data processing pipeline in Azure Data Factory that uses a Mapping Data Flow. You need to handle errors gracefully, such as when a row fails to convert a column value. Which TWO actions should you take? (Choose two.)

A
Wrap the data flow in a Try-Catch activity in the pipeline.
Why wrong: Mapping Data Flows do not have try-catch; you handle errors within the data flow.
B
Set the data flow's error handling to 'Abort on error' to stop processing on first failure.
Why wrong: Aborting is not graceful; it stops the entire pipeline.
C
Enable schema drift on the source to automatically handle data type mismatches.
Why wrong: Schema drift handles new columns, not conversion errors.
D
Configure the sink transformation to allow errors and log error rows to a separate file.
Sink can be configured to continue on error and write error rows to a file.
E
Use a Conditional Split transformation to separate rows that cause errors based on a condition.
Conditional split allows routing error rows to a separate sink for logging.

Full breakdown with real-world context →

Question 8hardmultiple choice

Full question →

A financial services firm uses Azure Synapse Analytics to process daily trade data. The data is stored in a dedicated SQL pool as partitioned tables by date. Recently, queries that filter on a specific date range have become slow. You suspect that partition pruning is not working effectively. What should you do to improve query performance?

A
Rebuild the columnstore indexes on the table
Rebuilding reduces fragmentation and improves partition pruning.
B
Convert the table to a rowstore heap
Why wrong: Rowstore is slower for large analytical queries.
C
Create statistics on the date column
Why wrong: Statistics help but fragmentation is the likely root cause.
D
Increase the number of partitions for the table
Why wrong: More partitions can increase overhead and not fix fragmentation.

Full breakdown with real-world context →

Question 9mediummultiple choice

Full question →

You are troubleshooting a failed Azure Synapse Pipeline execution. The pipeline uses a Copy activity to load data from an on-premises SQL Server to Azure Data Lake Storage Gen2. The error indicates a 'Connection timeout' to the on-premises source. The Integration Runtime is Self-Hosted and has been running successfully for months. What is the most likely cause?

A
The SQL Server authentication credentials have expired.
Why wrong: Unlikely if the pipeline was working recently; network issue is more probable.
B
The Self-Hosted Integration Runtime is not installed.
Why wrong: It has been running for months, so it is installed.
C
The on-premises network configuration has changed.
Network changes can block connectivity to the SQL Server.
D
The Azure Storage account firewall is blocking access.
Why wrong: The error is about the source connection, not destination.

Full breakdown with real-world context →

Question 10mediummultiple choice

Full question →

You are troubleshooting a slow-running pipeline in Azure Data Factory that uses a Copy activity to transfer data from Azure Blob Storage to Azure Synapse Analytics. The pipeline processes about 100 GB of CSV files. The copy performance is poor even though the source and sink are in the same region. What is the most likely cause?

A
The copy activity is not using staging and PolyBase
PolyBase dramatically improves load performance.
B
The source and sink are in different Azure regions
Why wrong: They are in the same region.
C
The Data Integration Unit (DIU) setting is too low
Why wrong: DIU affects throughput but PolyBase is more impactful.
D
The source files are compressed
Why wrong: Compression can actually improve performance.

Full breakdown with real-world context →

Question 11hardmultiple choice

Full question →

You are troubleshooting a slow-running Azure Synapse Pipeline that loads data from Azure Blob Storage into a dedicated SQL pool using a Copy activity. The source is a set of CSV files totaling 500 GB. The sink is a staging table with a clustered columnstore index. The pipeline takes 4 hours to complete. You need to reduce the execution time to under 1 hour. What should you do?

A
Enable PolyBase in the Copy activity sink settings.
PolyBase provides the fastest way to load data into dedicated SQL pool by leveraging its parallel architecture.
B
Increase the Data Integration Units (DIU) in the Copy activity to the maximum.
Why wrong: Increasing DIU improves parallelism but the bottleneck is often the sink; PolyBase is more effective.
C
Increase the dedicated SQL pool's DWU setting to the highest tier.
Why wrong: Scaling up improves query performance but does not change the copy method; the load may still be slow.
D
Partition the staging table on a date column.
Why wrong: Partitioning helps with query performance and partition switching, not with the initial load speed.

Full breakdown with real-world context →

Question 12mediummultiple choice

Full question →

You are troubleshooting a slow-running Azure Data Factory pipeline that copies data from an Azure SQL Database to ADLS Gen2. The pipeline uses a copy activity with the default settings. The source table has 10 million rows. Which optimization should you apply first?

A
Set the 'parallel copies' property to 10.
Why wrong: Parallel copies are automatically tuned; manual setting may not yield significant improvement.
B
Replace the copy activity with a mapping data flow.
Why wrong: Data flows are for transformations, not simple copies; they add overhead.
C
Increase the data integration unit (DIU) to maximum.
Why wrong: DIU increase costs more and may not address the bottleneck from SQL DB.
D
Enable staged copy using an Azure Blob Storage staging location.
Staging allows data to be transferred via Blob Storage, which improves throughput for SQL to ADLS copies.

Full breakdown with real-world context →

Question 13hardmultiple choice

Full question →

You have a Data Factory pipeline that runs a U-SQL script in Azure Data Lake Analytics. The script processes terabytes of data and outputs to a CSV file. The pipeline is failing with the error: 'The job failed with UserError: Script execution failed.' You need to troubleshoot the issue. Which approach should you take first?

A
Change the output format to Parquet to reduce file size.
Why wrong: Output format is unlikely to cause script execution failure.
B
Review the job logs in Azure Data Lake Analytics to identify the specific script error.
Job logs provide detailed error messages that pinpoint the issue.
C
Migrate the U-SQL script to Azure Synapse Spark pool.
Why wrong: This is a major change; troubleshooting should start with logs.
D
Increase the degree of parallelism for the U-SQL job.
Why wrong: This may help with performance but not with script errors.

Full breakdown with real-world context →

Question 14mediummultiple choice

Full question →

Your team is troubleshooting slow query performance on a dedicated SQL pool in Azure Synapse Analytics. The query uses a hash-distributed fact table with 60 distributions. After reviewing the execution plan, you notice a high number of data moves. Which action would most likely reduce data movement?

A
Change the distribution type to round-robin.
Why wrong: Round-robin distributes rows evenly but does not align with join keys, often increasing data movement.
B
Update statistics on all columns used in joins.
Why wrong: Statistics improve query plans but do not directly reduce data movement.
C
Increase the number of distributions to 120.
Why wrong: More distributions do not inherently reduce movement; alignment matters more.
D
Redistribute the fact table on the join column using hash distribution.
Hash distribution on the join column keeps related rows together, reducing data shuffling.

Full breakdown with real-world context →

These DP-203 practice questions are part of Courseiva's free Microsoft certification practice question bank. Courseiva provides original exam-style DP-203 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.

Troubleshooting Scenario Questions

How to approach troubleshooting scenario questions

Quick answer

Related DP-203 topic practice pages

Secure, monitor, and optimize data storage and data processing practice questions

Design and develop data processing practice questions

Design and implement data security practice questions

Monitor and optimize data storage and processing practice questions

Design and implement data storage practice questions

Develop data processing practice questions

DP-203 fundamentals practice questions

DP-203 scenario practice questions

DP-203 troubleshooting practice questions

Practice scenarios

Refer to the exhibit. A Stream Analytics job shows increasing watermark delay and input deserialization errors. Which action should be taken first to troubleshoot?

Exhibit

You are designing a data processing pipeline in Azure Data Factory that uses a Mapping Data Flow. You need to handle errors gracefully, such as when a row fails to convert a column value. Which TWO actions should you take? (Choose two.)

You are troubleshooting a slow-running Azure Data Factory pipeline that copies data from an Azure SQL Database to ADLS Gen2. The pipeline uses a copy activity with the default settings. The source table has 10 million rows. Which optimization should you apply first?

Your team is troubleshooting slow query performance on a dedicated SQL pool in Azure Synapse Analytics. The query uses a hash-distributed fact table with 60 distributions. After reviewing the execution plan, you notice a high number of data moves. Which action would most likely reduce data movement?