20+ practice questions focused on Automating and Orchestrating ML Pipelines — one of the most tested topics on the Google Professional Machine Learning Engineer exam. Each question includes a detailed explanation so you learn why the right answer is correct.
Start Automating and Orchestrating ML Pipelines PracticeA data scientist creates a custom Python function component for a Vertex AI pipeline using the Kubeflow Pipelines SDK v2. The component takes a string parameter 'input_text' and outputs a Metrics artifact. The scientist wants to include a lightweight Python function without building a container. Which code snippet correctly defines this component?
Explanation: Option D is correct because it uses the `@dsl.component` decorator with a `base_image` parameter, which is required for lightweight Python function components in Kubeflow Pipelines SDK v2. The decorator enables the component to run without a custom container by specifying a base image (here, `python:3.9`), and the function correctly returns a `Metrics` artifact after logging a metric. Without the decorator or with an incorrect decorator, the component would not be recognized as a pipeline component.
A machine learning engineer is building a Vertex AI pipeline that uses a pre-built Google Cloud Pipeline Components (GCPC) to train a custom model. Which component should the engineer use to submit a custom training job to Vertex AI?
Explanation: The CustomJob component is the correct choice because it is the pre-built GCPC component specifically designed to submit a custom training job to Vertex AI. It allows the engineer to specify a custom container image or a Python training script, along with machine configuration and hyperparameters, directly within a Vertex AI pipeline. Other components serve different purposes, such as hyperparameter tuning, batch predictions, or model deployment.
A team has a Vertex AI pipeline that includes a container component for data preprocessing. The team notices that the component is re-executed every time the pipeline runs, even when the inputs and code haven't changed. They want to leverage pipeline caching to avoid redundant executions. What should they do to enable caching for this component?
Explanation: Option D is correct because Vertex AI pipeline caching is enabled by default for all components unless explicitly disabled using `dsl.cache_options(enable_cache=False)`. The component re-executing every time indicates that caching was likely disabled on that specific component. Removing or ensuring this setting is not present will allow the pipeline to reuse cached outputs when inputs and code have not changed.
A machine learning team uses Vertex AI Pipelines to orchestrate their training pipeline. They want to trigger the pipeline automatically in response to new data arriving in a Cloud Storage bucket, and also support a scheduled run every day at 6 AM. Which combination of services should they use to achieve both event-driven and schedule-based triggers?
Explanation: Option C is correct because Cloud Scheduler can trigger the pipeline at 6 AM daily via a cron job, while Cloud Functions, triggered by Cloud Storage events (e.g., object finalize), can call the Vertex AI API to start the pipeline when new data arrives. This combination provides both schedule-based and event-driven triggers without requiring custom infrastructure.
A company is using Vertex AI Pipelines to automate model retraining. They have a component that creates a BigQuery table with training data. To ensure idempotency, the component should check if the table already exists and recreate it if necessary. What is the best practice for passing data between pipeline components?
Explanation: Option D is correct because Vertex AI Pipelines is designed to pass data between components via Cloud Storage artifacts. By storing the BigQuery table metadata or training data as a file in Cloud Storage and passing the GCS URI as an artifact, the pipeline ensures idempotency and decouples components. This approach aligns with Kubeflow Pipelines' artifact-based I/O model, where each component's outputs are materialized as URIs rather than in-memory objects.
+15 more Automating and Orchestrating ML Pipelines questions available
Practice all Automating and Orchestrating ML Pipelines questions1. Baseline your knowledge
Start with 10 questions to gauge your current understanding of Automating and Orchestrating ML Pipelines. This tells you whether you need a concept refresher or just practice.
2. Review every explanation
For each question — right or wrong — read the full explanation. Understanding why an answer is correct is more valuable than knowing the answer itself.
3. Focus on exam traps
Automating and Orchestrating ML Pipelines questions on the PMLE frequently use trap wording. Look for subtle differences in answers that test your precision, not just general knowledge.
4. Reach 80% consistently
Do repeated sessions until you score 80%+ three times in a row. Then move to mixed-mode practice to test cross-topic recall under realistic conditions.
The exact number varies per candidate. Automating and Orchestrating ML Pipelines is tested as part of the Google Professional Machine Learning Engineer blueprint. Practicing with targeted Automating and Orchestrating ML Pipelines questions ensures you can handle any format or difficulty that appears.
Yes. Courseiva provides free PMLE practice questions across all exam topics and domains. The platform includes topic-based practice, mock exams, missed-question review, bookmarked questions, and readiness tracking — no account required.
Difficulty is subjective, but Automating and Orchestrating ML Pipelines is a high-priority exam concept tested in multiple ways — direct recall, scenario analysis, and command-output interpretation. Consistent practice is the best way to build confidence.
Launch a full Automating and Orchestrating ML Pipelines practice session with instant scoring and detailed explanations.
Start Automating and Orchestrating ML Pipelines Practice →