Practice PMLE Automating and orchestrating ML pipelines questions with full explanations on every answer.
Start practicing
Automating and orchestrating ML pipelines — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
An MLOps team is implementing a CI/CD pipeline for a TensorFlow model on Vertex AI. The model training job takes 2 hours and produces a SavedModel. The team wants to automatically trigger a new pipeline run whenever a change is pushed to the 'main' branch of their source repository. The pipeline should include training, evaluation, and if metrics exceed a threshold, deploy the model to a Vertex AI endpoint. Which trigger configuration should they use?
2A data science team is deploying a PyTorch model for real-time inference using Vertex AI Endpoints. The model requires a custom container with specific CUDA drivers and Python packages. They have created a Docker image and pushed it to Artifact Registry. The pipeline should automatically retrain the model every week and deploy the new version if it passes validation. However, the deployment step fails intermittently with the error 'The container image is not compatible with the machine type.' What is the most likely cause?
3An ML engineer is using Vertex AI Pipelines with Kubeflow Pipelines SDK (KFP) to orchestrate a training and deployment workflow. They want to reuse a custom component across multiple pipelines. The component is defined in a Python file 'preprocess.py' that includes a function decorated with @kfp.components.create_component_from_func. How should they package this component for reuse?
4A company has a Vertex AI pipeline that trains a model on streaming data from Pub/Sub. The pipeline is triggered by a Cloud Function when new data arrives. Recently, jobs have been failing with 'ResourceExhausted: Quota limit exceeded for regional CPUs in us-central1.' The team needs to ensure successful job execution while minimizing changes. Which approach should they take?
5An ML team is designing an automated pipeline to retrain a recommendation model every day using new user interaction data stored in BigQuery. The pipeline must be cost-efficient, scalable, and require minimal manual intervention. Which two approaches should they consider?
6You are an ML engineer at a large e-commerce company. Your team has developed a product recommendation model using TensorFlow and deployed it on Vertex AI Endpoints for real-time inference. The model is retrained weekly using a Vertex AI Pipeline that reads new user interaction data from BigQuery, trains the model, evaluates it, and deploys the new version to the endpoint with a traffic split: 10% to the new model and 90% to the previous champion model. Recently, the team noticed that the new model's online prediction latency has increased significantly (from 50ms to 200ms) after deployment, causing timeouts for some requests. The training code has not changed, and the model size is similar. The pipeline uses a custom container with the same TensorFlow Serving image as before. The deployment step uses the same machine type (n1-standard-4) for the endpoint. What is the most likely cause of the latency increase?
7You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)
8A developer creates a Cloud Build trigger that runs a training pipeline whenever code is pushed to the main branch of the repository. The trigger is configured to use a source archive stored in Cloud Storage. After pushing code to main, the build fails with the error shown. What is the most likely cause of this failure?
9Your team manages a production ML pipeline on Google Cloud that trains a fraud detection model every 6 hours using new transaction data. The pipeline steps are: (1) Cloud Function triggered by new files in Cloud Storage to validate data, (2) Dataflow job for feature engineering, (3) Vertex AI CustomJob for training, (4) Cloud Function to deploy the model to a Vertex AI endpoint after evaluation. You notice that the pipeline sometimes fails during the Dataflow job step with an error: 'Workflow failed. Causes: The job encountered a system error. Please try again later.' The error occurs sporadically, and retrying the pipeline manually usually succeeds. The team needs a reliable automated solution. What should you do?
10Drag and drop the steps to set up a BigQuery ML linear regression model for forecasting in the correct order.
11Drag and drop the steps to set up a batch prediction job using Vertex AI in the correct order.
12Match each Google Cloud AI/ML service to its primary purpose.
13Match each MLOps practice to its description.
14An MLOps team wants to automate the retraining of a model each time new data arrives in a BigQuery table. What is the most efficient Google Cloud service to orchestrate this pipeline?
15A data scientist has trained a model using Vertex AI Training and wants to deploy it to a Vertex AI Endpoint for online predictions. Which orchestration service should be used to automate the deployment step after training completes?
16A company uses Cloud Composer to orchestrate their ML pipelines. They notice that tasks are being queued but not executed, causing delays. What is the most likely cause?
17An ML engineer is using Vertex AI Pipelines and wants to reuse a trained model across multiple pipeline runs without retraining each time. Which artifact management strategy should be used?
18A team wants to implement CI/CD for their ML models using Cloud Build. They have a pipeline that trains a model and deploys it. What is the best practice for triggering the pipeline when a new commit is pushed to the source repository?
19A data-processing pipeline using Dataflow needs to incorporate a custom ML prediction step. The team wants to maintain fast processing and minimize latency. What is the optimal approach?
20A company is using Vertex AI Pipelines with reusable components. They observe that a component that performs hyperparameter tuning is failing intermittently with a 'ResourceExhausted' error. The component is configured with a small custom service account. What is the most likely cause?
21An organization has multiple ML pipelines running on Vertex AI. They want to centralize monitoring and alerting for pipeline failures, including root cause analysis. Which combination of services should they use?
22A team uses Cloud Composer to orchestrate a complex ML pipeline with many tasks. They notice that the DAG parsing time is very high, causing delays in task scheduling. Which action would most effectively reduce DAG parsing time?
23Which TWO options are best practices for building ML pipelines on Vertex AI?
24Which THREE actions should be taken to automate a machine learning pipeline using Cloud Build and Vertex AI?
25Which TWO strategies can help reduce the cost of running ML pipelines on Vertex AI?
26The exhibit shows a Cloud Build configuration. An ML engineer wants to automate the deployment of a model to Vertex AI after training. What is missing in this config to successfully deploy the model?
27The exhibit shows a Cloud Composer environment variable configuration. An ML pipeline DAG fails with an authentication error when trying to access Vertex AI. What is the most likely cause?
28The exhibit shows a Vertex AI PipelineJob submission command. The pipeline fails because the component cannot find the input data. What is the most likely cause?
29A data scientist wants to automate the retraining of a model when new data arrives in Cloud Storage. Which Google Cloud service is most appropriate for orchestrating this workflow?
30An ML team is using Vertex AI Pipelines to automate model training and deployment. They want to reuse components across multiple pipelines. What is the best practice for managing component code?
31A company uses Vertex AI Pipelines to train and deploy models. The pipeline has a step that runs a custom container. The step fails intermittently with a timeout error. Which approach should be taken to robustly handle this?
32An ML engineer is designing a CI/CD pipeline for ML models using Cloud Build and Cloud Deploy. They want to automatically test model performance on a validation set before promoting to production. Which step should be included in the CI/CD pipeline?
33A team is using Cloud Composer to orchestrate ML workflows. They have a DAG that triggers a Vertex AI Training job, then a prediction deployment. The deployment step occasionally fails due to quota limits. What is the best way to handle this?
34A company uses Vertex AI Pipelines with Kubeflow DSL for hyperparameter tuning. They notice that some trials fail due to OOM errors. How should they configure the pipeline to automatically handle this?
35An organization wants to implement continuous training for a model that serves predictions via Vertex AI Endpoints. Which approach best automates the retrain-deploy cycle?
36An ML engineer is using Cloud Build to trigger a Vertex AI Pipeline on every commit to a repository. The pipeline takes 2 hours. The engineer wants to only run the pipeline when changes are made to specific directories. How can this be achieved?
37A company uses Vertex AI Pipelines with prebuilt components for data processing, training, and deployment. They need to integrate a custom validation step written in Python. What is the correct way to include this as a component?
38Which TWO are benefits of using Vertex AI Pipelines for ML workflow orchestration over deploying custom Airflow DAGs in Cloud Composer? (Choose TWO.)
39Which THREE are best practices for implementing CI/CD for ML pipelines on Google Cloud? (Choose THREE.)
40Which THREE should be considered when setting up an automated retraining pipeline using Vertex AI Pipelines and Cloud Composer? (Choose THREE.)
41The exhibit shows part of a Vertex AI Pipeline definition. The pipeline fails at the training step with an error: 'Missing required input: train_data'. What is the most likely cause?
42A large financial company uses a complex ML pipeline to detect fraudulent transactions. The pipeline consists of multiple steps: data ingestion from Pub/Sub, feature engineering using Dataflow, model training with Vertex AI, and deployment to an endpoint. They currently use Cloud Composer to orchestrate the pipeline with separate DAGs for each step. Recently, they have been experiencing failures in the Dataflow job due to schema changes in the incoming transactions, causing the pipeline to stall. The team manually fixes the schema and re-runs the pipeline, which is time-consuming. They want to improve the robustness of the pipeline. The pipeline is run on a schedule but also triggered by the arrival of new data. The team is considering moving to Vertex AI Pipelines to unify the workflow. They also want to automatically detect schema changes and handle them without manual intervention. Which approach should they take?
43A data science team uses Vertex AI Pipelines to build a training pipeline. They notice that when the pipeline fails due to a transient error in a component, the entire pipeline restarts from the beginning, taking a long time. What is the best practice to handle transient errors efficiently?
44A company deploys a training pipeline on Vertex AI using custom containers. The pipeline includes a hyperparameter tuning job that uses Bayesian optimization. After several runs, they observe that the tuning job is not converging and the search space is large. They want to reduce the number of trials while still finding good hyperparameters. Which strategy should they use?
45A machine learning engineer is designing an ML pipeline on Vertex AI. The pipeline includes multiple steps: data validation, preprocessing, training, evaluation, and deployment. The engineer wants to ensure that if the data validation step fails due to schema mismatch, the pipeline stops immediately and does not proceed. Additionally, they want to reuse the preprocessed data from a previous successful run if the source data hasn't changed. Which two configurations should they use? (Choose two.)
46You are responsible for maintaining an ML pipeline that runs daily on Vertex AI Pipelines. The pipeline preprocesses data, trains a model, and deploys it to an endpoint. Recently, the pipeline has been failing at the deployment step because the endpoint already exists and the deploy step tries to create a new endpoint instead of updating the existing one. The pipeline code is written using the Kubeflow Pipelines SDK. You need to modify the pipeline to resolve this issue with minimal changes. What should you do?
47Your team is developing a machine learning model for real-time fraud detection. The training pipeline runs on Vertex AI and uses BigQuery for feature engineering. Recently, the pipeline has been taking significantly longer to execute. Upon investigation, you find that the BigQuery query for feature extraction is being rerun every time the pipeline runs, even though the underlying data hasn't changed. The pipeline is scheduled to run every hour. You want to reduce cost and execution time without losing the ability to detect data drifts. Which approach should you take?
48A large e-commerce company uses Vertex AI Pipelines to orchestrate its recommendation model training. The pipeline has several parallel components: feature engineering, model training, and model evaluation. Recently, they noticed that the pipeline often fails due to resource exhaustion in the Vertex AI custom training job for the model training component. The training job consumes significant memory and occasionally exceeds the allocated memory limit, causing the pod to be OOMKilled. The team has already increased the memory to the maximum allowed for the chosen machine type. They need to prevent the pipeline from failing while still using the same machine type. Which approach should they take?
49A company uses Cloud Scheduler to trigger Cloud Functions that submit Vertex AI training jobs. They want to ensure fault tolerance and minimize manual intervention. Which TWO practices should they implement?
50Refer to the exhibit. A ML engineer runs this Vertex AI pipeline. After execution, the "train" task fails with a resource exhaustion error. The task consumes more memory than allocated. Which step should the engineer take to fix this issue without increasing the overall quota cost?
51A pharmaceutical company uses Vertex AI Pipelines with custom training containers. Recently, the pipeline has been failing with 'Container failed with exit code 137' (out of memory). The container runs with default memory limit. The team needs to fix this without changing the code. The project quota for CPU and memory is sufficient. What should the team do?
The Automating and orchestrating ML pipelines domain covers the key concepts tested in this area of the PMLE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PMLE domains — no account required.
The Courseiva PMLE question bank contains 51 questions in the Automating and orchestrating ML pipelines domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Automating and orchestrating ML pipelines domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included