PDE Designing data processing systems • Set 2
PDE Designing data processing systems Practice Test 2 — 15 questions with explanations. Free, no signup.
A company runs a Cloud Dataflow streaming pipeline that reads from Cloud Pub/Sub, performs a fixed window of 10 seconds, joins with a slowly-changing dimension table stored in Cloud Bigtable, and writes results to BigQuery. The pipeline has been running for months but recently started exhibiting increasing latency and occasional data loss. The pipeline uses default settings with autoscaling enabled (min 2, max 20 workers). The Bigtable cluster has 3 nodes. The dimensions are updated infrequently. The latency has grown from seconds to minutes. Examining the Dataflow monitoring UI, you see that the 'System Lag' metric is increasing, and some windows are not being emitted. The CPU utilization on Bigtable nodes is below 50%. There are no errors in the logs. Which action is most likely to resolve the issue?