Practice PMLE Automating and Orchestrating ML Pipelines questions with full explanations on every answer.
Start practicing
Automating and Orchestrating ML Pipelines — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A data scientist creates a custom Python function component for a Vertex AI pipeline using the Kubeflow Pipelines SDK v2. The component takes a string parameter 'input_text' and outputs a Metrics artifact. The scientist wants to include a lightweight Python function without building a container. Which code snippet correctly defines this component?
2A machine learning engineer is building a Vertex AI pipeline that uses a pre-built Google Cloud Pipeline Components (GCPC) to train a custom model. Which component should the engineer use to submit a custom training job to Vertex AI?
3A team has a Vertex AI pipeline that includes a container component for data preprocessing. The team notices that the component is re-executed every time the pipeline runs, even when the inputs and code haven't changed. They want to leverage pipeline caching to avoid redundant executions. What should they do to enable caching for this component?
4A machine learning team uses Vertex AI Pipelines to orchestrate their training pipeline. They want to trigger the pipeline automatically in response to new data arriving in a Cloud Storage bucket, and also support a scheduled run every day at 6 AM. Which combination of services should they use to achieve both event-driven and schedule-based triggers?
5A company is using Vertex AI Pipelines to automate model retraining. They have a component that creates a BigQuery table with training data. To ensure idempotency, the component should check if the table already exists and recreate it if necessary. What is the best practice for passing data between pipeline components?
6A data engineer wants to orchestrate a complex workflow that includes running a Vertex AI pipeline, then a BigQuery job, and finally a Dataflow pipeline. The workflow must handle dependencies, retries, and monitoring. Which Google Cloud service is most suitable for this orchestration?
7A machine learning engineer is building a Vertex AI pipeline that uses a pre-built AutoML Tables component to train a classification model. The pipeline also includes a conditional step that deploys the model to an endpoint only if the evaluation metrics exceed a threshold. Which KFP feature should be used to implement the conditional deployment?
8A team uses Cloud Build to automatically trigger a Vertex AI pipeline when changes are pushed to the model code repository. They have a cloudbuild.yaml file that builds a container image and submits the pipeline. However, they want to run the pipeline only if the commit includes changes to the 'training/' directory. Which Cloud Build configuration option should be used to filter the trigger?
9A data scientist wants to create a Vertex AI pipeline component that uses a custom container image stored in Artifact Registry. The component should accept a dataset artifact as input and output a model artifact. Which component type should they use?
10A machine learning engineer needs to pass a large dataset between two components in a Vertex AI pipeline. What is the recommended way to pass this data?
11A team is using Vertex AI Pipelines to deploy a model. They have a component that evaluates the model and produces a ClassificationMetrics artifact. The pipeline should deploy the model only if the precision is greater than 0.9. They use dsl.If to check the metric. However, the condition always evaluates to False. What is the most likely cause?
12A company wants to implement continuous training for their ML model. The pipeline should be triggered when new training data arrives in Cloud Storage, and after training, the model should be automatically deployed to a staging endpoint if evaluation metrics pass a threshold. They also need to detect skew between training data and serving data. Which two services should they use for skew detection?
13A machine learning team uses Vertex AI Pipelines to run a multi-step training pipeline. They want to implement a continuous delivery (CD) process where a model is automatically promoted from staging to production only if it passes an evaluation gate. Which TWO actions should they include in their CI/CD pipeline? (Choose two.)
14A company runs a Vertex AI pipeline that uses a container component to preprocess data. The component downloads a large file from a public URL and saves the output to Cloud Storage. The pipeline fails intermittently with a 'timeout' error. Which THREE steps should the team take to improve reliability? (Choose three.)
15A data scientist is creating a Vertex AI pipeline using the Kubeflow Pipelines SDK v2. Which TWO statements about pipeline parameters are correct? (Choose two.)
16A data science team wants to build a Vertex AI pipeline that trains a model, evaluates it, and conditionally deploys it if the accuracy exceeds 0.9. They want to use the Kubeflow Pipelines SDK v2. Which construct allows them to conditionally execute the deployment step based on the evaluation metric?
17A machine learning engineer needs to schedule a Vertex AI pipeline to run daily at midnight. Which approach should they use?
18You are building a Vertex AI pipeline using the KFP SDK v2. One component processes a large dataset and outputs a metrics artifact. You notice that the component is being cached even when the dataset changes, because the component code and image remain the same. How can you force the component to always re-execute when the dataset changes?
19A team wants to implement continuous training for their ML model. The pipeline should be triggered when new training data arrives in a Cloud Storage bucket. Which combination of services should they use?
20You are using KFP SDK v2 to define a pipeline. You need to pass a large dataset between components. What is the best practice for passing data?
21A machine learning engineer wants to use a pre-built Google Cloud Pipeline Components (GCPC) to train a model using Vertex AI. Which component should they use?
22You are developing a Vertex AI pipeline that runs multiple parallel training jobs with different hyperparameters, then collects their results and selects the best model. Which KFP SDK v2 construct should you use to run the parallel training tasks?
23A company uses Cloud Composer to orchestrate their ML workflows. They have an Airflow DAG that runs a Vertex AI pipeline, then a BigQuery query, then a Dataflow job. The DAG is failing because the Vertex AI pipeline takes longer than the Airflow task timeout. What is the best way to handle this?
24You are defining a Python function component in KFP SDK v2. Which decorator should you use?
25A team wants to implement continuous delivery for their ML models. They have a pipeline that trains a model and evaluates it. If the evaluation metrics exceed a threshold, the model should be deployed to a staging endpoint, and after manual approval, to production. Which approach should they use?
26You are designing a Vertex AI pipeline that includes a container component. The component needs to use a custom container image that is stored in Artifact Registry. How should you specify the container image in the component definition?
27You have a Vertex AI pipeline that trains a model and outputs a Model artifact. You want to register this model in the Vertex AI Model Registry. Which pre-built Google Cloud Pipeline Components component should you use?
28You are building a CI/CD pipeline for an ML model using Cloud Build. When code is pushed to the main branch, you want to automatically build a training image, run a Vertex AI pipeline, and if the model evaluation passes, deploy it to a staging endpoint. Which two components are essential for this CI/CD pipeline?
29You are tasked with building a robust ML pipeline that must be idempotent and handle data skew between training and serving. Which three practices should you implement?
30You need to orchestrate a complex ML workflow that involves multiple Vertex AI pipelines, BigQuery jobs, and Dataflow pipelines. The workflow must handle dependencies, retries, and monitoring. Which two services are best suited for this orchestration?
31A data science team wants to deploy a ML pipeline on Vertex AI Pipelines that includes a component to train a model using a custom container. The component should be reusable across different pipelines and accept hyperparameters as inputs. Which approach should they take?
32An ML engineer is building a pipeline on Vertex AI Pipelines and wants to pass a dataset artifact from one component to another without incurring additional cost for intermediate storage. How should they define the input and output types?
33A team runs a Vertex AI pipeline daily. They notice that a component that downloads a file from a public URL always executes even when the URL and parameters haven't changed. They want to avoid unnecessary re-execution and reduce costs. What should they do?
34An organization wants to trigger a Vertex AI pipeline whenever new data arrives in a Cloud Storage bucket. Which approach should they use?
35An ML engineer is using Cloud Composer (Airflow) to orchestrate a ML workflow. They need to run a Vertex AI pipeline as one of the tasks in the DAG. Which Airflow operator should they use?
36A team has a pipeline that trains a model and then evaluates it. They want to conditionally deploy the model to a staging endpoint only if evaluation metrics exceed a threshold. Which KFP feature should they use?
37What is the primary benefit of using pipeline caching in Vertex AI Pipelines?
38An engineer needs to compile a Kubeflow Pipeline defined in Python to a JSON format that can be run on Vertex AI Pipelines. Which command should they use?
39A company has a CI/CD pipeline that retrains a model every time new training data is available. They want to automatically deploy the new model to production only if it passes a set of evaluation tests on a staging environment. Which approach best implements this?
40Which of the following is a best practice when designing idempotent pipeline components in Vertex AI?
41An ML team wants to run a hyperparameter tuning job on Vertex AI using a pre-built pipeline component. Which component should they use?
42What is the purpose of the 'importer' component in Vertex AI Pipelines?
43Your organization wants to automate the retraining of a model when new data is available and also on a weekly schedule. Which TWO services would you use together to achieve this? (Choose two.)
44A team is designing a ML pipeline that includes training, evaluation, and conditional deployment. They want to use Vertex AI Pipelines. Which THREE concepts should they use? (Choose three.)
45An ML engineer is building a continuous training pipeline that retrains a model when new data arrives. The pipeline should also detect skew between training and serving data. Which TWO Google Cloud services should they use? (Choose two.)
46A data scientist wants to define a lightweight Python function component in Vertex AI Pipelines using Kubeflow Pipelines SDK v2. Which decorator should be applied to the function to make it a pipeline component?
47An ML engineer wants to containerize a custom training script and use it as a component in a Vertex AI Pipeline. The component should accept a dataset URI and a learning rate parameter, and output a trained model artifact. Which approach should the engineer use to define the component?
48A team notices that a Vertex AI Pipeline step re-executes every time the pipeline runs, even though its inputs and code have not changed. They want to enable caching for this component to avoid redundant computation. However, caching is currently disabled globally. Which configuration change will enable caching for that specific component?
49An ML engineer needs to trigger a Vertex AI Pipeline on a recurring schedule, every 24 hours, to retrain a model with the latest data. Which approach should they use to set up this schedule?
50A company wants to implement continuous delivery (CD) for ML models, where a model is automatically deployed to a staging environment and only promoted to production after passing an evaluation gate. Which combination of GCP services is BEST suited for orchestrating this CD pipeline?
51An ML engineer is building a pipeline that includes a step to run a BigQuery query and pass the results to the next step. They want to use a pre-built Google Cloud Pipeline Component for BigQuery. Which component should they use to execute a query and output the results to a destination table?
52A Vertex AI Pipeline contains a task that may produce outputs that are not always needed. The engineer wants to conditionally execute downstream tasks only if a specific artifact is produced. Which KFP SDK v2 construct allows the engineer to implement this conditional execution?
53An organization wants to use Cloud Composer (Airflow) to orchestrate a machine learning workflow that includes running a Vertex AI Pipeline, followed by a BigQuery job, and then a Dataflow pipeline. What is the primary advantage of using Cloud Composer for this orchestration?
54An ML engineer is designing a pipeline that should run only when new training data arrives in a Cloud Storage bucket. Which event-driven approach should they use to trigger the Vertex AI Pipeline?
55A team is implementing CI/CD for ML using Cloud Build. They want to trigger a training pipeline in Vertex AI whenever a new model code is pushed to the main branch of the repository. Which Cloud Build configuration should they use to achieve this?
56In a Vertex AI Pipeline, a component produces a Metrics artifact that includes an evaluation metric. The engineer wants to use this metric value as a condition to decide whether to deploy the model. However, the metric value is stored in the artifact's metadata and not directly as a pipeline parameter. How can the engineer pass the metric value to a downstream conditional task?
57An ML engineer is building a pipeline component that takes a dataset URI and a model URI as inputs, and outputs a classification metrics artifact. Which KFP SDK v2 type should the output artifact be annotated with?
58An organization is building a continuous training (CT) pipeline that retrains a model whenever one of the following conditions is met: (1) new training data is available in Cloud Storage, (2) it's the first day of the month, or (3) the model's performance degrades below a threshold. Which TWO mechanisms should they combine to trigger the pipeline?
59A company is using Vertex AI Pipelines for ML workflows. They want to implement best practices for idempotent components and data passing. Which THREE practices should they adopt?
60An ML engineer is creating a Vertex AI Pipeline that includes a loop to train multiple models in parallel on different hyperparameter sets. Which TWO KFP SDK v2 constructs can be used to implement this parallel execution?
61A data science team wants to build a machine learning pipeline on Vertex AI Pipelines that preprocesses data, trains a model, and evaluates it. They need to ensure that components can be reused across multiple pipelines and that outputs from one component can be passed as inputs to another. Which approach should they take?
62A machine learning engineer has a Vertex AI pipeline that trains a model. The pipeline uses caching to avoid re-running components that have not changed. After updating the training code, the engineer notices that the pipeline still uses cached outputs from the previous run. What could be the reason?
63A team is building a CI/CD pipeline for an ML model. They want to automatically trigger a Vertex AI pipeline for retraining whenever new training data arrives in a Cloud Storage bucket, but only if a specific Pub/Sub notification is published by a data ingestion process. Which approach meets these requirements with minimal operational overhead?
64A machine learning engineer is using Vertex AI Pipelines and wants to run a custom Python function as a component. They need to pass a dataset artifact from a previous component and output a model artifact. Which decorator should they use to define the component in the Kubeflow Pipelines SDK v2?
65A company uses Cloud Composer to orchestrate a nightly ML workflow that includes running a Vertex AI pipeline, querying BigQuery, and running a Dataflow job. The Airflow DAG must run only if the previous day's Dataflow job succeeded. Which Airflow concept should they use to implement this dependency?
66A machine learning team wants to implement a continuous delivery pipeline for their ML models using Vertex AI Pipelines. The pipeline should automatically deploy a model to a staging endpoint after evaluation passes, and then after manual approval, promote it to production. Which strategy should they use to manage model versions in the Vertex AI Model Registry?
67A data scientist is defining a Vertex AI pipeline and needs to include a step that imports a pre-existing model from Cloud Storage into the pipeline as an artifact. Which Kubeflow Pipelines SDK v2 component should they use?
68A company uses Vertex AI Pipelines to train models on a daily schedule. The pipeline includes a component that runs a BigQuery query to extract features. The team wants to ensure that if the BigQuery component fails due to transient network errors, the pipeline automatically retries it. How can they configure retries in Vertex AI Pipelines?
69A machine learning engineer needs to create a pipeline that runs a custom container component on Vertex AI. The container expects a Cloud Storage path as input and outputs a model artifact. Which component type should they define using the Kubeflow Pipelines SDK v2?
70A team is building a continuous training pipeline that retrains a model when new data arrives. They want to detect data drift between the training dataset and the serving data. Which approach should they integrate into the pipeline to compare the distributions of the two datasets?
71An organization wants to trigger a Vertex AI pipeline whenever a new commit is pushed to the main branch of their Cloud Source Repository. The pipeline should retrain and evaluate the model. Which service should they use to detect the push event and start the pipeline?
72A machine learning engineer is building a pipeline with Vertex AI Pipelines and wants to pass a large dataset between components without copying it to the container's memory. What is the best practice for passing data between pipeline components?
73A data science team is designing a Vertex AI pipeline that includes a loop over a list of hyperparameter sets. They want to run training jobs in parallel for each hyperparameter set and then collect the results for comparison. Which two Kubeflow Pipelines SDK v2 features should they use? (Choose two.)
74A company wants to implement a CI/CD pipeline for their ML models using Vertex AI. They need to automatically retrain the model when new data arrives, but only if the model performance on a validation set has degraded by more than 5% compared to the current production model. Which three services or components should they incorporate into the automated pipeline? (Choose three.)
75A machine learning team uses Vertex AI Pipelines for model training. They want to implement a conditional step that runs additional evaluation if the model accuracy exceeds 0.9, otherwise it runs a data augmentation component. Which two Kubeflow Pipelines SDK v2 constructs can they use to achieve this? (Choose two.)
76A machine learning engineer wants to define a lightweight pipeline component that runs custom Python code without building a container image. Which KFP SDK feature should they use?
77An organization runs a Vertex AI pipeline that includes a model evaluation step. Team members want to reuse previously computed evaluation metrics when re-running the pipeline with unchanged code and hyperparameters. Which feature should they enable?
78A data science team uses Cloud Composer to orchestrate ML workflows. They need to trigger a Vertex AI pipeline after a BigQuery data load completes, and then run a Dataflow job. Which Airflow operator should they use to launch the Vertex AI pipeline?
79A machine learning pipeline includes a conditional branch: if model accuracy exceeds 0.95, deploy to production; otherwise, send a notification. Which KFP SDK feature allows implementing this logic within the pipeline definition?
80A company wants to automatically retrain their model every night at 2 AM using Vertex AI Pipelines. Which approach should they use to trigger the pipeline on a schedule?
81A team develops a pipeline that trains a model and evaluates it. They want to pass the test accuracy (a float) from the evaluation component to a subsequent deployment component. Which KFP SDK type should the evaluation component output be annotated with?
82An ML pipeline runs on Vertex AI and includes a component that uses a third-party library not available in the default Python environment. The team wants to avoid building a custom container image. Which approach should they use?
83A company uses Vertex AI Pipelines for ML training. They want to implement continuous training triggered by new data arrival. Which two Google Cloud services should they use to achieve this? (Choose two.)
84A pipeline uses the Google Cloud Pipeline Components to perform AutoML training and batch prediction. Which two components from the GCPC library should they use? (Choose two.)
85An ML pipeline must run a set of preprocessing tasks for each data shard in parallel. Which KFP SDK features should they use to implement this? (Choose two.)
86A team wants to implement CI/CD for their ML pipeline using Cloud Build. They want to automatically compile and deploy the pipeline when code is pushed to the main branch. Which three steps should they include in the Cloud Build configuration? (Choose three.)
87A data science team uses Cloud Composer to orchestrate a complex ML workflow. They need to run a Vertex AI pipeline and then a BigQuery query conditionally based on the pipeline's output. Which Airflow features should they use? (Choose two.)
88A pipeline includes a component that produces a model artifact. The team wants to automatically detect skew between the training data distribution and the serving data distribution. Which three best practices should they implement? (Choose three.)
89An organization wants to implement continuous delivery for their ML model. After a new model is trained and evaluated, they want to automatically deploy it to a staging endpoint, run validation tests, and if passed, promote to production. Which two components should they include in their delivery pipeline? (Choose two.)
The Automating and Orchestrating ML Pipelines domain covers the key concepts tested in this area of the PMLE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PMLE domains — no account required.
The Courseiva PMLE question bank contains 89 questions in the Automating and Orchestrating ML Pipelines domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Automating and Orchestrating ML Pipelines domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included