A data pipeline is built with Cloud Dataflow that reads from Pub/Sub, applies transformations, and writes to BigQuery. The pipeline is experiencing high latency and occasional data loss during worker failures. The engineer wants to improve reliability and performance. Which two actions should they take?
Streaming Engine reduces latency and improves reliability; BigQuery exactly-once sink prevents duplicates.
Why this answer
Enabling streaming engine moves the state management to the backend, reducing latency and improving reliability. Using exactly-once sinks (like BigQuery with exactly-once guarantees) prevents data loss.