DP-203 Design and develop data processing • Complete Question Bank
Complete DP-203 Design and develop data processing question bank — all 0 questions with answers and detailed explanations.
Refer to the exhibit.
```json
{
"identity": {
"type": "SystemAssigned",
"principalId": "12345678-1234-1234-1234-123456789012",
"tenantId": "87654321-4321-4321-4321-210987654321"
},
"properties": {
"provisioningState": "Succeeded",
"integrationRuntime": {
"type": "SelfHosted",
"properties": {
"typeProperties": {
"selfContainedInteractiveAuthoringEnabled": true,
"autoUpdate": true,
"latestVersion": "5.25.8327.1",
"pushedVersion": "5.25.8327.1",
"version": "5.23.8123.0",
"status": "Online"
}
}
}
}
}
```Match the Azure service to its primary data processing use case. Drag each service on the left to the correct use case on the right.
Services: Azure Databricks, Azure Stream Analytics, Azure Data Factory, Azure Synapse Analytics Use Cases: - Real-time event processing - Orchestration of ETL pipelines - Big data analytics with Spark - Enterprise data warehousing
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag a concept onto its matching description — or click a concept then click the description.
Three synchronous copies within a single data center
Three copies across multiple availability zones in a region
Geo-redundant storage with read access in secondary region
Geo-zone-redundant storage with read access in secondary region
Drag a concept onto its matching description — or click a concept then click the description.
Query external data in Azure Storage using T-SQL
High-throughput data ingestion into Synapse SQL
Orchestrate data movement and transformation
Complex data engineering with notebooks
Drag a concept onto its matching description — or click a concept then click the description.
Hierarchical namespace for Azure Data Lake Storage
Optimized for frequent data access
Optimized for infrequent access with lower cost
Lowest cost for rarely accessed data
Azure Data Factory pipeline JSON snippet:
{
"name": "CopyDataPipeline",
"activities": [
{
"name": "CopyFromBlobToSQL",
"type": "Copy",
"inputs": [{"referenceName": "BlobDS", "type": "DatasetReference"}],
"outputs": [{"referenceName": "SQLDS", "type": "DatasetReference"}],
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": true
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"preCopyScript": "TRUNCATE TABLE dbo.target"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"columnMappings": {
"Id": "Id",
"Name": "FullName"
}
}
}
}
]
}Azure Stream Analytics job diagnostics log:
{
"time": "2023-08-01T12:00:00Z",
"properties": {
"jobId": "job-123",
"jobName": "IoTStreamJob",
"events": [
{
"time": "2023-08-01T11:59:00Z",
"type": "WatermarkDelay",
"properties": {
"watermarkDelaySeconds": 120,
"maxWatermarkDelaySeconds": 300
}
},
{
"time": "2023-08-01T11:59:30Z",
"type": "InputDeserializationError",
"properties": {
"source": "iothub",
"count": 15
}
}
],
"jobOutputWatermark": "2023-08-01T11:57:00Z"
}
}Azure Databricks cluster configuration JSON:
{
"cluster_name": "ETL Cluster",
"spark_version": "10.4.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
"spark_conf": {
"spark.sql.adaptive.enabled": "true",
"spark.sql.adaptive.coalescePartitions.enabled": "true",
"spark.sql.adaptive.advisoryPartitionSizeInBytes": "64MB"
},
"aws_attributes": {
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "us-west-2a"
}
}