A financial services company uses Dataflow pipelines with late data handling. They need to ensure that all late-arriving data is processed correctly but also want to control costs. What is the best configuration?
Fixed windows with a realistic allowed lateness capture late data without excessive state cost, and a trivial watermark ensures no data is dropped.
Why this answer
Option D is correct because using fixed windows with allowed lateness set to the maximum expected delay and a trivial watermark balances completeness and cost. Option A (global window with long allowed lateness) can cause high state cost. Option B (session windows) may merge late data incorrectly.
Option C (sliding windows with short allowed lateness and side input) is complex and may miss data.