← PMLE·Google Cloud

Question 119 of 506

Automating and orchestrating ML pipelines →mediumMultiple ChoiceObjective-mapped

Quick Answer

The answer is to orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator. This is correct because Cloud Composer, built on Apache Airflow, allows you to define a Cloud Composer DAG retry for transient failures by setting parameters like `retries` and `retry_delay` directly on the Dataflow operator, which automatically re-launches the job when it encounters sporadic system errors like the one described. On the Google Professional Machine Learning Engineer exam, this scenario tests your understanding of production ML pipeline reliability and the distinction between transient versus permanent failures—a common trap is to over-engineer a solution with custom error handling or manual triggers when a simple retry policy suffices. Remember that for quota limit or system errors in Dataflow, the fix is often a retry, not a redesign. Memory tip: think "Composer retries compose the solution for transient Dataflow sighs."

PMLE Automating and orchestrating ML pipelines Practice Question

This PMLE practice question tests your understanding of automating and orchestrating ml pipelines. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

Your team manages a production ML pipeline on Google Cloud that trains a fraud detection model every 6 hours using new transaction data. The pipeline steps are: (1) Cloud Function triggered by new files in Cloud Storage to validate data, (2) Dataflow job for feature engineering, (3) Vertex AI CustomJob for training, (4) Cloud Function to deploy the model to a Vertex AI endpoint after evaluation. You notice that the pipeline sometimes fails during the Dataflow job step with an error: 'Workflow failed. Causes: The job encountered a system error. Please try again later.' The error occurs sporadically, and retrying the pipeline manually usually succeeds. The team needs a reliable automated solution. What should you do?

Question 1mediummultiple choice

Full question →

A
Schedule the pipeline to run less frequently to reduce load on the Dataflow service.
Why wrong: Reducing frequency does not fix the sporadic errors and reduces model freshness.
B
Use Cloud Tasks to queue the Dataflow job and retry on failure.
Why wrong: Cloud Tasks is for asynchronous task execution, not orchestrating a multi-step pipeline with dependencies.
C
Increase the number of Dataflow workers and use flexRS to handle transient errors.
Why wrong: Transient system errors are not resolved by scaling resources; the job still may fail.
D
Orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator.
Cloud Composer (Airflow) can manage the pipeline DAG with automatic retries and dependencies.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

Orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator.

Option D is correct because Cloud Composer (Apache Airflow) provides native retry policies on its Dataflow operators, enabling automatic retries of the Dataflow job when it fails due to transient system errors. This addresses the sporadic failure pattern without manual intervention, ensuring the pipeline runs reliably every 6 hours.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Schedule the pipeline to run less frequently to reduce load on the Dataflow service.
Why it's wrong here
Reducing frequency does not fix the sporadic errors and reduces model freshness.
✗
Use Cloud Tasks to queue the Dataflow job and retry on failure.
Why it's wrong here
Cloud Tasks is for asynchronous task execution, not orchestrating a multi-step pipeline with dependencies.
✗
Increase the number of Dataflow workers and use flexRS to handle transient errors.
Why it's wrong here
Transient system errors are not resolved by scaling resources; the job still may fail.
✓
Orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator.
Why this is correct
Cloud Composer (Airflow) can manage the pipeline DAG with automatic retries and dependencies.
Related concept
Read the scenario before looking for a memorised answer.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates confuse scaling solutions (Option C) with fault-tolerance mechanisms, or they choose a generic queuing service (Option B) instead of a dedicated orchestrator with built-in retry policies for pipeline steps.

Detailed technical explanation

How to think about this question

Cloud Composer's DataflowCreateJavaJobOperator and DataflowCreatePythonJobOperator support parameters like 'retry' and 'retry_delay' that leverage Airflow's exponential backoff mechanism. Under the hood, Airflow's retry logic re-polls the Dataflow job status until it succeeds or exhausts retries, which is essential for handling sporadic system errors that are not resolved by simply scaling resources. In real-world scenarios, such errors often stem from temporary zonal resource exhaustion or quota limits, which retries can bypass by scheduling on different resources.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.

TExam Day Tips

Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A media company stores terabytes of video archives that are accessed once a year for audit purposes. Moving these objects to a cold storage tier (Azure Archive, S3 Glacier, or Google Nearline) costs a fraction of hot storage. Questions like this test whether you understand storage tiers, access frequency tradeoffs, and retrieval latency requirements.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related PMLE practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Scaling prototypes into ML models practice questions

Practise PMLE questions linked to Scaling prototypes into ML models.

Automating and orchestrating ML pipelines practice questions

Practise PMLE questions linked to Automating and orchestrating ML pipelines.

Collaborating within and across teams to manage data and models practice questions

Practise PMLE questions linked to Collaborating within and across teams to manage data and models.

Architecting low-code ML solutions practice questions

Practise PMLE questions linked to Architecting low-code ML solutions.

Collaborating to manage data and models practice questions

Practise PMLE questions linked to Collaborating to manage data and models.

Serving and scaling models practice questions

Practise PMLE questions linked to Serving and scaling models.

Monitoring ML solutions practice questions

Practise PMLE questions linked to Monitoring ML solutions.

Solving business challenges with ML practice questions

Practise PMLE questions linked to Solving business challenges with ML.

PMLE fundamentals practice questions

Practise PMLE questions linked to PMLE fundamentals.

PMLE scenario practice questions

Practise PMLE questions linked to PMLE scenario.

PMLE troubleshooting practice questions

Practise PMLE questions linked to PMLE troubleshooting.

Practice this exam

Start a free PMLE practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

PMLE practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this PMLE question test?

Automating and orchestrating ML pipelines — This question tests Automating and orchestrating ML pipelines — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator. — Option D is correct because Cloud Composer (Apache Airflow) provides native retry policies on its Dataflow operators, enabling automatic retries of the Dataflow job when it fails due to transient system errors. This addresses the sporadic failure pattern without manual intervention, ensuring the pipeline runs reliably every 6 hours.

What should I do if I get this PMLE question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

1 more ways this is tested on PMLE

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A team is using Cloud Composer to orchestrate ML workflows. They have a DAG that triggers a Vertex AI Training job, then a prediction deployment. The deployment step occasionally fails due to quota limits. What is the best way to handle this?

medium

A.Increase the quota manually
B.Use Vertex AI Pipelines instead of Cloud Composer
C.Create a custom sensor to wait for quota to be available
D.Catch the exception in the DAG and send an alert
✓ E.Implement exponential backoff retry in the DAG task

Why E: Option E is correct because Cloud Composer (Apache Airflow) provides built-in retry mechanisms via task parameters like `retries` and `retry_delay`. Implementing exponential backoff in the DAG task is the best practice for handling transient quota errors, as it automatically retries the deployment step with increasing delays, reducing load on the quota system and increasing the chance of success without manual intervention. This approach aligns with Airflow's native error-handling capabilities and avoids unnecessary complexity or resource waste.

Last reviewed: Jun 11, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This PMLE practice question is part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the PMLE exam.