A financial services company stream trades into Pub/Sub and processes with Dataflow. The pipeline must ensure exactly-once processing of each trade for regulatory compliance. However, Pub/Sub guarantees at-least-once delivery. Which combination of features should the Dataflow pipeline use to achieve exactly-once semantics?
Dataflow's exactly-once mode with idempotent sinks ensures output exactly once.
Why this answer
Dataflow's exactly-once sink combined with idempotent writes ensures exactly-once output. Pub/Sub cannot guarantee exactly-once delivery, but Dataflow can deduplicate using unique IDs. Idempotent writes prevent duplicates even if Dataflow retries.