PDE · topic practice

Maintaining and Automating Data Workloads practice questions

Practise Google Professional Data Engineer Maintaining and Automating Data Workloads practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Maintaining and Automating Data Workloads

What the exam tests

What to know about Maintaining and Automating Data Workloads

Maintaining and Automating Data Workloads questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Maintaining and Automating Data Workloads exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Maintaining and Automating Data Workloads questions

20 questions · select your answer, then reveal the explanation

A data engineer uses Cloud Composer to orchestrate a daily batch pipeline. A downstream task should only start after an upstream BigQuery load job finishes successfully and a specific file appears in Cloud Storage. Which combination of operators should the engineer use in the Airflow DAG?

A company uses Dataflow streaming pipelines to process real-time events. They notice increasing system lag over time. Which two Cloud Monitoring metrics should be examined to diagnose the cause?

A data team needs to share a BigQuery dataset with another business unit. They want to provide a point-in-time snapshot of the data without incurring additional storage costs for the copy. Which BigQuery feature should they use?

An engineer needs to create a reusable Dataflow pipeline that can be executed with different parameters without modifying code. Which Dataflow feature should they use?

A company runs a Dataproc cluster for ETL jobs that process data nightly. They want to reduce costs while maintaining performance. Which strategy is MOST effective?

A data engineer needs to alert when Pub/Sub subscription has messages older than 1 hour. Which Cloud Monitoring metric and filter should they use?

A team wants to enforce data quality rules on BigQuery tables using Dataplex. They need to run column-level checks for null values and row-level checks for value ranges on a schedule. Which Dataplex feature should they use?

An organization uses BigQuery on-demand pricing. To control costs, they want to estimate the bytes processed by a query before running it. Which command or method should they use?

A company uses Cloud Composer for pipeline orchestration. They need to define task dependencies where Task B and Task C can run in parallel after Task A, and Task D must run after both B and C complete. How should they define the DAG?

A streaming Dataflow pipeline needs to be updated without draining the existing pipeline. Which update strategy should be used?

A company wants to use Cloud DLP to inspect data in BigQuery for sensitive information and de-identify it by masking credit card numbers. They want to perform this on a schedule. Which approach should they take?

A data engineer notices that BigQuery queries are slower than expected. They want to identify the most expensive stages in the query execution. Which tool or command should they use?

A data engineer needs to migrate a schema from BigQuery where a column is currently REQUIRED and needs to become NULLABLE. Which TWO statements are correct? (Choose 2)

A company runs BigQuery workloads with varying demand. They want to use flat-rate pricing with baseline slots and the ability to burst during peak times. Which TWO actions should they take? (Choose 2)

A company uses Cloud Composer (Airflow) to orchestrate pipelines. They want to implement a pattern where a task polls for a file arrival in Cloud Storage and then triggers subsequent tasks. Which THREE Airflow concepts are essential? (Choose 3)

A data engineer is building a batch pipeline that runs daily using Cloud Composer. The pipeline has three tasks: extract data from Cloud Storage, transform data using Dataflow, and load the transformed data into BigQuery. The engineer wants to ensure that the Dataflow job only starts after the extraction task completes successfully, and the load task only starts after the Dataflow job finishes. How should the engineer define the task dependencies in the Airflow DAG?

You need to schedule a simple workflow that fetches data from an API every hour, transforms it using Cloud Functions, and writes the result to Cloud Storage. The workflow has no complex branching or retry logic beyond basic retries. Which orchestration service is the MOST cost-effective and simplest to implement?

A company runs a streaming Dataflow pipeline that reads from Pub/Sub, enriches data with a side input from BigQuery, and writes to BigQuery. After updating the pipeline code (adding a new field to the output), the engineer notices that the new pipeline version is not picking up the updated code because the job was started from a template. The engineer wants to update the streaming pipeline without draining it. What should the engineer do?

You are monitoring a streaming Dataflow pipeline that reads from Pub/Sub and writes to BigQuery. In Cloud Monitoring, you notice that the 'system_lag' metric is increasing over time and now exceeds 10 minutes. The 'data_watermark' metric shows a steady lag. What is the most likely cause of the increasing system lag?

A company wants to share a large BigQuery dataset with a partner for analysis. The partner needs read-only access to a specific snapshot of the data as of a certain point in time, and the company wants to avoid additional storage costs for the partner. What is the most cost-effective approach?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Maintaining and Automating Data Workloads sessions

Start a Maintaining and Automating Data Workloads only practice session

Every question in these sessions is drawn from the Maintaining and Automating Data Workloads domain — nothing else.

Related practice questions

Related PDE topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PDE exam test about Maintaining and Automating Data Workloads?
Maintaining and Automating Data Workloads questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Maintaining and Automating Data Workloads questions in a focused session?
Yes — the session launcher on this page draws every question from the Maintaining and Automating Data Workloads domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PDE topics?
Use the topic links above to move to related areas, or go back to the PDE question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PDE exam covers. They are not copied from any real exam or dump site.