Question 27 of 499

Quick Answer

The answer is to use Pub/Sub, a Dataflow pipeline, Cloud Bigtable for real-time queries, and Cloud Storage for periodic BigQuery loads. This architecture is correct because Cloud Bigtable delivers sub-second latency on the most recent streaming IoT data, which is essential for near real-time analytics, while Dataflow handles the necessary stream processing, filtering, and transformation before routing data to both Bigtable for low-latency access and Cloud Storage for batch ingestion into BigQuery for historical analysis. On the Google Professional Data Engineer exam, this scenario tests your understanding of decoupling real-time and historical storage paths, a common pattern for IoT workloads; a frequent trap is choosing a single storage solution like BigQuery for both real-time and historical data, which cannot meet sub-second latency requirements. Remember the key distinction: Bigtable for the hot path, BigQuery for the cold path. A useful memory tip is “Hot Bigtable, Cold BigQuery” to keep the two storage layers straight.

PDE Practice Question: Building and operationalizing data processing systems

This PDE practice question tests your understanding of building and operationalizing data processing systems. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A team wants to ingest streaming data from millions of IoT devices and store historical data in BigQuery for analysis. They need near real-time analytics on the most recent data, with sub-second latency. Which architecture should they use?

Question 1mediummultiple choice
Full question →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Use Pub/Sub, then a Dataflow pipeline that filters and transforms data, writing to Cloud Bigtable for real-time queries and to Cloud Storage for periodic BigQuery loads.

Option B is correct because it uses Cloud Bigtable for sub-second latency on recent data, which is ideal for near real-time analytics on streaming IoT data. Dataflow provides the necessary stream processing, filtering, and transformation before writing to Bigtable for low-latency queries and to Cloud Storage for periodic batch loads into BigQuery for historical analysis. This architecture decouples real-time and historical paths, meeting both latency and storage requirements.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Use Pub/Sub to receive data, then stream directly into BigQuery using the streaming API, and use standard SQL queries for real-time analytics.

    Why it's wrong here

    BigQuery streaming inserts have latency of a few seconds, not sub-second, and query performance can vary.

  • Use Pub/Sub, then a Dataflow pipeline that filters and transforms data, writing to Cloud Bigtable for real-time queries and to Cloud Storage for periodic BigQuery loads.

    Why this is correct

    Bigtable provides sub-millisecond latency for real-time queries, and BigQuery handles large-scale analytics.

    Related concept

    Read the scenario before looking for a memorised answer.

  • Use Pub/Sub to ingest data into a Dataproc Spark Streaming job that writes to both Bigtable and BigQuery.

    Why it's wrong here

    Spark Streaming typically has seconds of latency, not sub-second.

  • Use Cloud SQL to store the latest data and periodically move historical data to BigQuery via cron jobs.

    Why it's wrong here

    Cloud SQL cannot handle millions of writes per second and does not scale horizontally.

Common exam traps

Common exam trap: answer the scenario, not the keyword

Google Cloud often tests the misconception that BigQuery's streaming API can provide sub-second query latency, but in reality, BigQuery is a columnar analytics engine optimized for large scans, not for low-latency point reads, which is why a separate low-latency store like Bigtable is required for real-time access.

Detailed technical explanation

How to think about this question

Cloud Bigtable uses a distributed, sorted key-value store based on HBase, enabling consistent sub-10ms read/write latencies at high throughput, which is critical for IoT time-series data. Dataflow's streaming engine uses exactly-once processing and autoscaling to handle millions of events per second, while BigQuery's streaming buffer allows near-real-time ingestion but queries on that buffer still incur higher latency (typically seconds) compared to Bigtable's direct row lookups. In practice, this architecture is used for use cases like real-time device monitoring dashboards where the latest sensor readings must be displayed instantly, while historical trend analysis runs on BigQuery.

KKey Concepts to Remember

  • Read the scenario before looking for a memorised answer.
  • Find the constraint that changes the correct option.
  • Eliminate answers that are true in general but not in this case.

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A media company stores terabytes of video archives that are accessed once a year for audit purposes. Moving these objects to a cold storage tier (Azure Archive, S3 Glacier, or Google Nearline) costs a fraction of hot storage. Questions like this test whether you understand storage tiers, access frequency tradeoffs, and retrieval latency requirements.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related practice questions

Related PDE practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free PDE practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this PDE question test?

Building and operationalizing data processing systems — This question tests Building and operationalizing data processing systems — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Use Pub/Sub, then a Dataflow pipeline that filters and transforms data, writing to Cloud Bigtable for real-time queries and to Cloud Storage for periodic BigQuery loads. — Option B is correct because it uses Cloud Bigtable for sub-second latency on recent data, which is ideal for near real-time analytics on streaming IoT data. Dataflow provides the necessary stream processing, filtering, and transformation before writing to Bigtable for low-latency queries and to Cloud Storage for periodic batch loads into BigQuery for historical analysis. This architecture decouples real-time and historical paths, meeting both latency and storage requirements.

What should I do if I get this PDE question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Last reviewed: Jun 30, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This PDE practice question is part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the PDE exam.