PDE · topic practice

Ingesting and Processing the Data practice questions

Practise Google Professional Data Engineer Ingesting and Processing the Data practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Ingesting and Processing the Data

What the exam tests

What to know about Ingesting and Processing the Data

Ingesting and Processing the Data questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Ingesting and Processing the Data exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Ingesting and Processing the Data questions

20 questions · select your answer, then reveal the explanation

A data engineer needs to load 10 TB of CSV files from Amazon S3 into Google BigQuery on a daily basis. Which service should they use to automate this transfer?

You need to stream real-time user click events from your application into BigQuery for immediate analysis. The events must be available for query within seconds. Which approach is recommended?

Your company is migrating an on-premises Hadoop cluster to Google Cloud. You need to transform large datasets using Spark SQL. Which Google Cloud service should you use?

A data engineer needs to transfer 500 TB of on-premises data to Google Cloud Storage. The data is stored on NAS devices and the network bandwidth is limited to 100 Mbps. What is the most cost-effective and timely transfer method?

Question 5mediummultiple choice
Study the full Python automation breakdown →

You are building a Dataflow pipeline in Python that reads messages from Pub/Sub, enriches them with data from a BigQuery table, and writes the results to BigQuery. The enrichment lookup table is large and changes infrequently. Which approach minimizes cost and latency?

You are designing a Dataflow pipeline to process streaming data. The pipeline may encounter malformed records. You need to handle these errors without failing the entire pipeline and store the bad records for later analysis. What is the best practice?

Your company uses Kafka for event streaming. You want to run Kafka on Google Cloud with the ability to auto-scale clusters and use managed infrastructure. Which service should you choose?

You need to perform a one-time migration of historical data from an on-premises Teradata data warehouse to BigQuery. The data volume is 50 TB and you have a high-speed network connection (10 Gbps). What is the most efficient way to load the data?

You have a Dataflow pipeline that processes streaming data with high throughput. You notice that the pipeline is experiencing high latency and the workers are underutilized. Which Dataflow feature can automatically optimize resource allocation?

Your organization uses dbt (data build tool) for transformations on BigQuery. You need to run dbt models on a schedule and manage versions. Which Google Cloud service can execute dbt jobs in a serverless manner?

You are migrating an on-premises PostgreSQL database to Cloud SQL. You need to continuously replicate changes to BigQuery for real-time analytics with minimal latency. Which service should you use?

You are designing a Dataflow pipeline that needs to exactly-once process events from Pub/Sub and write to BigQuery using the Storage Write API. The pipeline may restart and could reprocess some messages. What setting ensures exactly-once semantics for the output?

You need to process a large volume of event data from Cloud Storage, apply complex transformations using Apache Spark, and then load the results into BigQuery. The data arrives in batches every hour. You want to minimize costs by using preemptible VMs. Which service should you use?

Which TWO statements are true about BigQuery Data Transfer Service? (Choose 2)

You are building a Dataflow pipeline that reads from Pub/Sub, applies transformations, and writes to BigQuery. The pipeline must handle late-arriving data and ensure that the windowing and triggering are correct. Which THREE configurations should you consider? (Choose 3)

An organization wants to ingest on-premises Oracle database changes into BigQuery for real-time analytics with minimal latency. The Oracle database is version 19c and has a high transactional volume. Which Google Cloud service should they use?

A data engineer needs to schedule recurring nightly loads from Amazon S3 to Google Cloud Storage. The data is in CSV format and the volume is approximately 500 GB per night. Which Google Cloud service should they use?

A company runs a Dataflow pipeline that reads from Pub/Sub, transforms data, and writes to BigQuery. The pipeline uses classic templates and is deployed in batch mode. They notice that the pipeline does not scale well under high load, causing a backlog in Pub/Sub. Which improvement would BEST address the scaling issue?

A company needs to load data from a MySQL database into BigQuery daily. The data volume is 10 GB per day and the schema changes occasionally. They want to minimize costs and operational overhead. What is the MOST appropriate approach?

A media company streams real-time viewer data from Pub/Sub to BigQuery using a Dataflow pipeline. They need to handle occasional malformed messages without losing valid data. Which pattern should they implement?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Ingesting and Processing the Data sessions

Start a Ingesting and Processing the Data only practice session

Every question in these sessions is drawn from the Ingesting and Processing the Data domain — nothing else.

Related practice questions

Related PDE topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PDE exam test about Ingesting and Processing the Data?
Ingesting and Processing the Data questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Ingesting and Processing the Data questions in a focused session?
Yes — the session launcher on this page draws every question from the Ingesting and Processing the Data domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PDE topics?
Use the topic links above to move to related areas, or go back to the PDE question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PDE exam covers. They are not copied from any real exam or dump site.