How should I use these Maintaining and Automating Data Workloads practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

PDE · topic practice

Maintaining and Automating Data Workloads practice questions

Q: Can I practise just Maintaining and Automating Data Workloads questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the Maintaining and Automating Data Workloads domain.

Practise Google Professional Data Engineer Maintaining and Automating Data Workloads practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: Maintaining and Automating Data Workloads

Practice 10 questions Browse domain →

What the exam tests

What to know about Maintaining and Automating Data Workloads

Maintaining and Automating Data Workloads questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Maintaining and Automating Data Workloads exam traps

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Practice set

Maintaining and Automating Data Workloads questions

20 questions · select your answer, then reveal the explanation

Question 1mediummultiple choice

Read the full Maintaining and Automating Data Workloads explanation →

A data engineer uses Cloud Composer to orchestrate a daily batch pipeline. A downstream task should only start after an upstream BigQuery load job finishes successfully and a specific file appears in Cloud Storage. Which combination of operators should the engineer use in the Airflow DAG?

Trap 1: BigQueryInsertJobOperator with wait_for_downstream=True

wait_for_downstream is not a parameter. Sensor and dependency are needed.

Trap 2: DataflowPythonOperator and GCSObjectExistenceSensor

DataflowPythonOperator is for Dataflow pipelines, not BigQuery load.

Trap 3: BigQueryOperator and FileSensor with downstream dependency

FileSensor is for local files, not GCS. Also downstream dependency is incorrect.

Study all Maintaining and Automating Data Workloads common traps →

A
BigQueryInsertJobOperator with wait_for_downstream=True
Why wrong: wait_for_downstream is not a parameter. Sensor and dependency are needed.
B
BigQueryInsertJobOperator and GCSObjectExistenceSensor with upstream dependency
Correct: BigQueryInsertJobOperator performs the load, GCSObjectExistenceSensor polls for the file, and upstream dependency ensures order.
C
DataflowPythonOperator and GCSObjectExistenceSensor
Why wrong: DataflowPythonOperator is for Dataflow pipelines, not BigQuery load.
D
BigQueryOperator and FileSensor with downstream dependency
Why wrong: FileSensor is for local files, not GCS. Also downstream dependency is incorrect.

Maintaining and Automating Data Workloads practice questions

What to know about Maintaining and Automating Data Workloads

Common Maintaining and Automating Data Workloads exam traps

Maintaining and Automating Data Workloads questions

A data engineer uses Cloud Composer to orchestrate a daily batch pipeline. A downstream task should only start after an upstream BigQuery load job finishes successfully and a specific file appears in Cloud Storage. Which combination of operators should the engineer use in the Airflow DAG?

A company uses Dataflow streaming pipelines to process real-time events. They notice increasing system lag over time. Which two Cloud Monitoring metrics should be examined to diagnose the cause?

A data team needs to share a BigQuery dataset with another business unit. They want to provide a point-in-time snapshot of the data without incurring additional storage costs for the copy. Which BigQuery feature should they use?

An engineer needs to create a reusable Dataflow pipeline that can be executed with different parameters without modifying code. Which Dataflow feature should they use?

A company runs a Dataproc cluster for ETL jobs that process data nightly. They want to reduce costs while maintaining performance. Which strategy is MOST effective?

A data engineer needs to alert when Pub/Sub subscription has messages older than 1 hour. Which Cloud Monitoring metric and filter should they use?

A team wants to enforce data quality rules on BigQuery tables using Dataplex. They need to run column-level checks for null values and row-level checks for value ranges on a schedule. Which Dataplex feature should they use?

An organization uses BigQuery on-demand pricing. To control costs, they want to estimate the bytes processed by a query before running it. Which command or method should they use?

A company uses Cloud Composer for pipeline orchestration. They need to define task dependencies where Task B and Task C can run in parallel after Task A, and Task D must run after both B and C complete. How should they define the DAG?

A streaming Dataflow pipeline needs to be updated without draining the existing pipeline. Which update strategy should be used?

A company wants to use Cloud DLP to inspect data in BigQuery for sensitive information and de-identify it by masking credit card numbers. They want to perform this on a schedule. Which approach should they take?

A data engineer notices that BigQuery queries are slower than expected. They want to identify the most expensive stages in the query execution. Which tool or command should they use?

A data engineer needs to migrate a schema from BigQuery where a column is currently REQUIRED and needs to become NULLABLE. Which TWO statements are correct? (Choose 2)

A company runs BigQuery workloads with varying demand. They want to use flat-rate pricing with baseline slots and the ability to burst during peak times. Which TWO actions should they take? (Choose 2)

A company uses Cloud Composer (Airflow) to orchestrate pipelines. They want to implement a pattern where a task polls for a file arrival in Cloud Storage and then triggers subsequent tasks. Which THREE Airflow concepts are essential? (Choose 3)

A company wants to share a large BigQuery dataset with a partner for analysis. The partner needs read-only access to a specific snapshot of the data as of a certain point in time, and the company wants to avoid additional storage costs for the partner. What is the most cost-effective approach?

Track your progress over time

Start a Maintaining and Automating Data Workloads only practice session

Related PDE topic practice pages

Designing Data Processing Systems practice questions

Ingesting and Processing the Data practice questions

Storing the Data practice questions

Preparing and Using Data for Analysis practice questions

Maintaining and Automating Data Workloads practice questions

Building and operationalizing data processing systems practice questions

Operationalizing machine learning models practice questions

Ensuring solution quality practice questions

PDE fundamentals practice questions

PDE scenario practice questions

PDE troubleshooting practice questions

Frequently asked questions

Track your progress

Study resources

Exam traps to avoid