PDE Building and operationalizing data processing systems • Complete Question Bank
Complete PDE Building and operationalizing data processing systems question bank — all 0 questions with answers and detailed explanations.
Refer to the exhibit.
Error log from Dataflow job:
"""
Workflow failed. Causes: S3D3: BigQueryIO.Write/BatchLoads/Loads/AllocateLoadTable/ParDo(AllocateLoadTable) failed.
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write$BigQueryWriteException: BigQuery insertion failed: Response JSON: {
"error": {
"errors": [
{
"domain": "global",
"reason": "invalid",
"message": "Provided Schema does not match Table employee_records. Field last_name has type STRING but provided type INTEGER"
}
],
"code": 400,
"message": "Provided Schema does not match Table employee_records. Field last_name has type STRING but provided type INTEGER"
}
}
"""Refer to the exhibit.
Cloud Pub/Sub subscription configuration:
{
"name": "projects/my-project/subscriptions/my-sub",
"topic": "projects/my-project/topics/my-topic",
"pushConfig": {},
"ackDeadlineSeconds": 10,
"messageRetentionDuration": "86400s",
"expirationPolicy": {
"ttl": "604800s"
},
"enableMessageOrdering": false,
"retryPolicy": {
"minimumBackoff": "10s",
"maximumBackoff": "600s"
},
"deadLetterPolicy": {
"deadLetterTopic": "projects/my-project/topics/dead-letter-topic",
"maxDeliveryAttempts": 5
}
}Refer to the exhibit. ``` # Dataflow pipeline log snippet 2024-03-15 10:00:00 ERROR Transform 'ParseLogs': org.apache.beam.sdk.util.WindowedValue$CoderLoadingException: Unable to load coder for class com.example.LogEvent 2024-03-15 10:00:01 ERROR Transform 'ParseLogs': java.lang.NoSuchMethodError: com.example.LogEvent: method <init>()V not found ```
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag a concept onto its matching description — or click a concept then click the description.
Extract, Transform, Load
Extract, Load, Transform
Raw data storage in native format
Optimized storage for structured analytics
Drag a concept onto its matching description — or click a concept then click the description.
Atomicity, Consistency, Isolation, Durability
Basically Available, Soft state, Eventual consistency
Consistency, Availability, Partition tolerance trade-off
Horizontal partitioning of data across databases
Drag a concept onto its matching description — or click a concept then click the description.
Collecting data from various sources
Persisting data in a durable system
Transforming and analyzing data
Making data available for consumption
Moving data to long-term, low-cost storage
Refer to the exhibit.
{
"bindings": [
{
"role": "roles/bigquery.dataViewer",
"members": [
"user:analyst@example.com"
]
},
{
"role": "roles/bigquery.metadataviewer",
"members": [
"user:analyst@example.com"
]
}
],
"etag": "BwXX2Yz7k0Q="
}Refer to the exhibit. ``` # error log from Dataflow job Worker failed to start: Operation timed out after 30.0 seconds. Possible causes: - Insufficient CPU quota in the region. - Networking issues preventing VM creation. - Stale custom image. - gRPC connection failure to the Dataflow service. ```
Refer to the exhibit. ``` # BigQuery table schema and sample data Table: mydataset.events Columns: event_id: STRING (REQUIRED) event_timestamp: TIMESTAMP (REQUIRED) event_data: STRING (NULLABLE) user_id: STRING (REQUIRED) Partitioned by: event_timestamp (daily) Clustered by: user_id Job: Dataflow pipeline writing 1000 events/second to this table using streaming inserts with insertId = event_id. Monitoring shows intermittent 'duplicate rows' in queries that count distinct event_ids. ```
Refer to the exhibit.
```
# gcloud dataproc clusters describe output
clusterName: my-cluster
config:
softwareConfig:
imageVersion: '2.0-debian10'
gceClusterConfig:
zoneUri: projects/my-project/zones/us-central1-a
internalIpOnly: false
masterConfig:
machineTypeUri: n1-standard-4
numInstances: 1
workerConfig:
machineTypeUri: n1-standard-4
numInstances: 10
preemptibility: ON
secondaryWorkerConfig:
numInstances: 0
status:
state: RUNNING
```