Question 323 of 499

Quick Answer

The correct architecture is an active-active cross-region setup with two separate Dataflow pipelines reading from a Pub/Sub cross-region subscription and writing to a BigQuery cross-region dataset. This design meets the stringent RPO of under one minute and RTO of under five minutes because cross-region Pub/Sub subscriptions replicate messages with sub-second latency, ensuring no data loss during a regional outage, while the independent Dataflow pipelines in each region provide instant failover without manual intervention. On the Google Professional Data Engineer exam, this scenario tests your understanding of disaster recovery architecture for streaming data pipeline low RPO RTO requirements, often appearing as a trap where candidates mistakenly choose a single-region pipeline with backups or a cold standby. The key insight is that streaming pipelines demand active-active replication, not just data durability. Remember the memory tip: “Two pipes, two regions, zero downtime”—if your RPO is under a minute, you cannot afford to replay data; you need both pipelines running simultaneously.

PDE Practice Question: Building and operationalizing data processing systems

This PDE practice question tests your understanding of building and operationalizing data processing systems. This is a configuration task: choose the command set that satisfies every stated requirement. Small differences — like 'secret' vs 'password' or 'transport input ssh' vs 'all' — change whether the answer is correct. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

You are designing a disaster recovery strategy for a critical streaming data processing pipeline. The pipeline reads from Cloud Pub/Sub, processes with Dataflow streaming, and writes to BigQuery. The required RPO is less than 1 minute, and RTO is less than 5 minutes. Which architecture should you implement?

Question 1hardmultiple choice
Full question →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Use cross-region replication with two separate Dataflow pipelines reading from a Pub/Sub cross-region subscription and writing to a BigQuery cross-region dataset

Option A is correct because cross-region replication for Pub/Sub ensures messages are available in a secondary region with sub-second latency, and a separate Dataflow pipeline reading from a cross-region subscription provides active-active processing. BigQuery cross-region dataset replication (using the 'cross-region' dataset location, e.g., EU or US multi-region, or a specific dual-region configuration) ensures data durability and availability within the RPO of <1 minute. This architecture meets both RPO and RTO by eliminating single points of failure and enabling automatic failover without manual intervention.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Use cross-region replication with two separate Dataflow pipelines reading from a Pub/Sub cross-region subscription and writing to a BigQuery cross-region dataset

    Why this is correct

    Cross-region replication ensures data is available in another region with minimal latency, meeting RPO and RTO.

    Related concept

    Read the scenario before looking for a memorised answer.

  • Run the pipeline using Dataflow batch mode with a 1-minute trigger and store intermediate results in Cloud Storage

    Why it's wrong here

    Batch mode has higher latency and RPO may exceed 1 minute due to batch intervals.

  • Deploy resources in a single region with regular backups to Cloud Storage

    Why it's wrong here

    Single region fails during a regional outage; backups have higher RPO.

  • Use a single Dataflow pipeline with a standby cluster in another region, but failover is manual

    Why it's wrong here

    Manual failover increases RTO beyond 5 minutes.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates often assume a single pipeline with a standby cluster is sufficient, but they overlook that manual failover cannot meet the strict RTO of <5 minutes, and that cross-region replication must be active-active (not active-passive) to achieve sub-minute RPO.

Detailed technical explanation

How to think about this question

Under the hood, Pub/Sub cross-region subscriptions use a multi-region topic with message replication across zones and regions, leveraging Google's global network for low-latency delivery. Dataflow streaming pipelines use exactly-once processing semantics and can checkpoint state to Cloud Storage, but the key is that each pipeline operates independently in its region, reading from the same cross-region subscription to avoid data loss. BigQuery cross-region datasets use synchronous replication (e.g., dual-region in US or EU) to ensure that writes are durable across regions within seconds, supporting the RPO requirement.

KKey Concepts to Remember

  • Read the scenario before looking for a memorised answer.
  • Find the constraint that changes the correct option.
  • Eliminate answers that are true in general but not in this case.

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A media company stores terabytes of video archives that are accessed once a year for audit purposes. Moving these objects to a cold storage tier (Azure Archive, S3 Glacier, or Google Nearline) costs a fraction of hot storage. Questions like this test whether you understand storage tiers, access frequency tradeoffs, and retrieval latency requirements.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related practice questions

Related PDE practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free PDE practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this PDE question test?

Building and operationalizing data processing systems — This question tests Building and operationalizing data processing systems — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Use cross-region replication with two separate Dataflow pipelines reading from a Pub/Sub cross-region subscription and writing to a BigQuery cross-region dataset — Option A is correct because cross-region replication for Pub/Sub ensures messages are available in a secondary region with sub-second latency, and a separate Dataflow pipeline reading from a cross-region subscription provides active-active processing. BigQuery cross-region dataset replication (using the 'cross-region' dataset location, e.g., EU or US multi-region, or a specific dual-region configuration) ensures data durability and availability within the RPO of <1 minute. This architecture meets both RPO and RTO by eliminating single points of failure and enabling automatic failover without manual intervention.

What should I do if I get this PDE question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Last reviewed: Jun 24, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This PDE practice question is part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the PDE exam.