How should I use these Ingesting and Processing the Data practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

Can I practise just Ingesting and Processing the Data questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the Ingesting and Processing the Data domain.

PDE · topic practice

Ingesting and Processing the Data practice questions

Practise Google Professional Data Engineer Ingesting and Processing the Data practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: Ingesting and Processing the Data

Practice 10 questions Browse domain →

What the exam tests

What to know about Ingesting and Processing the Data

Ingesting and Processing the Data questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Ingesting and Processing the Data exam traps

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Practice set

Ingesting and Processing the Data questions

20 questions · select your answer, then reveal the explanation

Question 1easymultiple choice

Read the full Ingesting and Processing the Data explanation →

A data engineer needs to load 10 TB of CSV files from Amazon S3 into Google BigQuery on a daily basis. Which service should they use to automate this transfer?

Trap 1: Dataproc

Dataproc is for running Spark/Hadoop jobs; not designed for simple file transfers.

Trap 2: Cloud Data Fusion

Cloud Data Fusion is a managed ETL/ELT service, but it is overkill and not the simplest solution for a straightforward S3-to-BigQuery transfer.

Trap 3: Storage Transfer Service

Storage Transfer Service transfers data to Cloud Storage, not directly to BigQuery. Additional steps are needed.

Study all Ingesting and Processing the Data common traps →

A
Dataproc
Why wrong: Dataproc is for running Spark/Hadoop jobs; not designed for simple file transfers.
B
Cloud Data Fusion
Why wrong: Cloud Data Fusion is a managed ETL/ELT service, but it is overkill and not the simplest solution for a straightforward S3-to-BigQuery transfer.
C
BigQuery Data Transfer Service
BigQuery Data Transfer Service supports scheduled transfers from Amazon S3 directly into BigQuery.
D
Storage Transfer Service
Why wrong: Storage Transfer Service transfers data to Cloud Storage, not directly to BigQuery. Additional steps are needed.

Ingesting and Processing the Data practice questions

What to know about Ingesting and Processing the Data

Common Ingesting and Processing the Data exam traps

Ingesting and Processing the Data questions

A data engineer needs to load 10 TB of CSV files from Amazon S3 into Google BigQuery on a daily basis. Which service should they use to automate this transfer?

You need to stream real-time user click events from your application into BigQuery for immediate analysis. The events must be available for query within seconds. Which approach is recommended?

Your company is migrating an on-premises Hadoop cluster to Google Cloud. You need to transform large datasets using Spark SQL. Which Google Cloud service should you use?

A data engineer needs to transfer 500 TB of on-premises data to Google Cloud Storage. The data is stored on NAS devices and the network bandwidth is limited to 100 Mbps. What is the most cost-effective and timely transfer method?

You are building a Dataflow pipeline in Python that reads messages from Pub/Sub, enriches them with data from a BigQuery table, and writes the results to BigQuery. The enrichment lookup table is large and changes infrequently. Which approach minimizes cost and latency?

You are designing a Dataflow pipeline to process streaming data. The pipeline may encounter malformed records. You need to handle these errors without failing the entire pipeline and store the bad records for later analysis. What is the best practice?

Your company uses Kafka for event streaming. You want to run Kafka on Google Cloud with the ability to auto-scale clusters and use managed infrastructure. Which service should you choose?

You need to perform a one-time migration of historical data from an on-premises Teradata data warehouse to BigQuery. The data volume is 50 TB and you have a high-speed network connection (10 Gbps). What is the most efficient way to load the data?

You have a Dataflow pipeline that processes streaming data with high throughput. You notice that the pipeline is experiencing high latency and the workers are underutilized. Which Dataflow feature can automatically optimize resource allocation?

Your organization uses dbt (data build tool) for transformations on BigQuery. You need to run dbt models on a schedule and manage versions. Which Google Cloud service can execute dbt jobs in a serverless manner?

You are migrating an on-premises PostgreSQL database to Cloud SQL. You need to continuously replicate changes to BigQuery for real-time analytics with minimal latency. Which service should you use?

You are designing a Dataflow pipeline that needs to exactly-once process events from Pub/Sub and write to BigQuery using the Storage Write API. The pipeline may restart and could reprocess some messages. What setting ensures exactly-once semantics for the output?

You need to process a large volume of event data from Cloud Storage, apply complex transformations using Apache Spark, and then load the results into BigQuery. The data arrives in batches every hour. You want to minimize costs by using preemptible VMs. Which service should you use?

Which TWO statements are true about BigQuery Data Transfer Service? (Choose 2)

You are building a Dataflow pipeline that reads from Pub/Sub, applies transformations, and writes to BigQuery. The pipeline must handle late-arriving data and ensure that the windowing and triggering are correct. Which THREE configurations should you consider? (Choose 3)

An organization wants to ingest on-premises Oracle database changes into BigQuery for real-time analytics with minimal latency. The Oracle database is version 19c and has a high transactional volume. Which Google Cloud service should they use?

A data engineer needs to schedule recurring nightly loads from Amazon S3 to Google Cloud Storage. The data is in CSV format and the volume is approximately 500 GB per night. Which Google Cloud service should they use?

A company needs to load data from a MySQL database into BigQuery daily. The data volume is 10 GB per day and the schema changes occasionally. They want to minimize costs and operational overhead. What is the MOST appropriate approach?

A media company streams real-time viewer data from Pub/Sub to BigQuery using a Dataflow pipeline. They need to handle occasional malformed messages without losing valid data. Which pattern should they implement?

Track your progress over time

Start a Ingesting and Processing the Data only practice session

Related PDE topic practice pages

Designing Data Processing Systems practice questions

Ingesting and Processing the Data practice questions

Storing the Data practice questions

Preparing and Using Data for Analysis practice questions

Maintaining and Automating Data Workloads practice questions

Building and operationalizing data processing systems practice questions

Operationalizing machine learning models practice questions

Ensuring solution quality practice questions

PDE fundamentals practice questions

PDE scenario practice questions

PDE troubleshooting practice questions

Frequently asked questions

Track your progress

Study resources

Exam traps to avoid