PMLE Automating and orchestrating ML pipelines — All Questions With Answers

Question 1mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An MLOps team is implementing a CI/CD pipeline for a TensorFlow model on Vertex AI. The model training job takes 2 hours and produces a SavedModel. The team wants to automatically trigger a new pipeline run whenever a change is pushed to the 'main' branch of their source repository. The pipeline should include training, evaluation, and if metrics exceed a threshold, deploy the model to a Vertex AI endpoint. Which trigger configuration should they use?

Question 2hardmultiple choice

Study the full Python automation breakdown →

A data science team is deploying a PyTorch model for real-time inference using Vertex AI Endpoints. The model requires a custom container with specific CUDA drivers and Python packages. They have created a Docker image and pushed it to Artifact Registry. The pipeline should automatically retrain the model every week and deploy the new version if it passes validation. However, the deployment step fails intermittently with the error 'The container image is not compatible with the machine type.' What is the most likely cause?

Question 3easymultiple choice

Study the full Python automation breakdown →

An ML engineer is using Vertex AI Pipelines with Kubeflow Pipelines SDK (KFP) to orchestrate a training and deployment workflow. They want to reuse a custom component across multiple pipelines. The component is defined in a Python file 'preprocess.py' that includes a function decorated with @kfp.components.create_component_from_func. How should they package this component for reuse?

Question 4hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company has a Vertex AI pipeline that trains a model on streaming data from Pub/Sub. The pipeline is triggered by a Cloud Function when new data arrives. Recently, jobs have been failing with 'ResourceExhausted: Quota limit exceeded for regional CPUs in us-central1.' The team needs to ensure successful job execution while minimizing changes. Which approach should they take?

Question 5mediummulti select

Read the full Automating and orchestrating ML pipelines explanation →

An ML team is designing an automated pipeline to retrain a recommendation model every day using new user interaction data stored in BigQuery. The pipeline must be cost-efficient, scalable, and require minimal manual intervention. Which two approaches should they consider?

Question 6hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

You are an ML engineer at a large e-commerce company. Your team has developed a product recommendation model using TensorFlow and deployed it on Vertex AI Endpoints for real-time inference. The model is retrained weekly using a Vertex AI Pipeline that reads new user interaction data from BigQuery, trains the model, evaluates it, and deploys the new version to the endpoint with a traffic split: 10% to the new model and 90% to the previous champion model. Recently, the team noticed that the new model's online prediction latency has increased significantly (from 50ms to 200ms) after deployment, causing timeouts for some requests. The training code has not changed, and the model size is similar. The pipeline uses a custom container with the same TensorFlow Serving image as before. The deployment step uses the same machine type (n1-standard-4) for the endpoint. What is the most likely cause of the latency increase?

Question 7hardmulti select

Read the full Automating and orchestrating ML pipelines explanation →

You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)

Question 8easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A developer creates a Cloud Build trigger that runs a training pipeline whenever code is pushed to the main branch of the repository. The trigger is configured to use a source archive stored in Cloud Storage. After pushing code to main, the build fails with the error shown. What is the most likely cause of this failure?

Exhibit

Refer to the exhibit.

```
symptom: Cloud Build trigger fails with: Build failed: could not resolve source: fetching source: fetching storage object: object not found

trigger configuration:
  event: push to branch main
  repository: my-repo
  included files: 'train/**'
  excluded files: 'test/**'
  source: gs://my-bucket/source.tar.gz
```

Question 9mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

Your team manages a production ML pipeline on Google Cloud that trains a fraud detection model every 6 hours using new transaction data. The pipeline steps are: (1) Cloud Function triggered by new files in Cloud Storage to validate data, (2) Dataflow job for feature engineering, (3) Vertex AI CustomJob for training, (4) Cloud Function to deploy the model to a Vertex AI endpoint after evaluation. You notice that the pipeline sometimes fails during the Dataflow job step with an error: 'Workflow failed. Causes: The job encountered a system error. Please try again later.' The error occurs sporadically, and retrying the pipeline manually usually succeeds. The team needs a reliable automated solution. What should you do?

Question 10mediumdrag order

Read the full Automating and orchestrating ML pipelines explanation →

Drag and drop the steps to set up a BigQuery ML linear regression model for forecasting in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 11mediumdrag order

Read the full Automating and orchestrating ML pipelines explanation →

Drag and drop the steps to set up a batch prediction job using Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 12mediummatching

Read the full Automating and orchestrating ML pipelines explanation →

Match each Google Cloud AI/ML service to its primary purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

End-to-end ML platform for building, deploying, and managing models

Train high-quality custom ML models with minimal effort

Managed service for distributed training of ML models

Custom ASIC for accelerating ML training workloads

Create and execute ML models using SQL queries

Question 13mediummatching

Read the full Automating and orchestrating ML pipelines explanation →

Match each MLOps practice to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Continuous integration and deployment for ML pipelines

Track and manage different model iterations

Monitor for changes in data or model performance over time

Schedule or trigger model retraining based on conditions

Compare model versions in production with traffic splitting

Question 14easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An MLOps team wants to automate the retraining of a model each time new data arrives in a BigQuery table. What is the most efficient Google Cloud service to orchestrate this pipeline?

Question 15easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A data scientist has trained a model using Vertex AI Training and wants to deploy it to a Vertex AI Endpoint for online predictions. Which orchestration service should be used to automate the deployment step after training completes?

Question 16easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company uses Cloud Composer to orchestrate their ML pipelines. They notice that tasks are being queued but not executed, causing delays. What is the most likely cause?

Question 17mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An ML engineer is using Vertex AI Pipelines and wants to reuse a trained model across multiple pipeline runs without retraining each time. Which artifact management strategy should be used?

Question 18mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A team wants to implement CI/CD for their ML models using Cloud Build. They have a pipeline that trains a model and deploys it. What is the best practice for triggering the pipeline when a new commit is pushed to the source repository?

Question 19mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A data-processing pipeline using Dataflow needs to incorporate a custom ML prediction step. The team wants to maintain fast processing and minimize latency. What is the optimal approach?

Question 20hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company is using Vertex AI Pipelines with reusable components. They observe that a component that performs hyperparameter tuning is failing intermittently with a 'ResourceExhausted' error. The component is configured with a small custom service account. What is the most likely cause?

Question 21hardmultiple choice

Read the full NAT/PAT explanation →

An organization has multiple ML pipelines running on Vertex AI. They want to centralize monitoring and alerting for pipeline failures, including root cause analysis. Which combination of services should they use?

Question 22hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A team uses Cloud Composer to orchestrate a complex ML pipeline with many tasks. They notice that the DAG parsing time is very high, causing delays in task scheduling. Which action would most effectively reduce DAG parsing time?

Question 23easymulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which TWO options are best practices for building ML pipelines on Vertex AI?

Question 24mediummulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which THREE actions should be taken to automate a machine learning pipeline using Cloud Build and Vertex AI?

Question 25hardmulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which TWO strategies can help reduce the cost of running ML pipelines on Vertex AI?

Question 26mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

The exhibit shows a Cloud Build configuration. An ML engineer wants to automate the deployment of a model to Vertex AI after training. What is missing in this config to successfully deploy the model?

Network Topology

Question 27hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

The exhibit shows a Cloud Composer environment variable configuration. An ML pipeline DAG fails with an authentication error when trying to access Vertex AI. What is the most likely cause?

Exhibit

environment_variables:
  GOOGLE_APPLICATION_CREDENTIALS: /home/airflow/gcs/data/vertex-ai-key.json
  AIRFLOW_VAR_PROJECT_ID: my-project
  AIRFLOW_VAR_LOCATION: us-central1

Question 28easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

The exhibit shows a Vertex AI PipelineJob submission command. The pipeline fails because the component cannot find the input data. What is the most likely cause?

Exhibit

gcloud ai pipelines submit \
  --project=my-project \
  --region=us-central1 \
  --pipeline-job-name=training-pipeline-20231001 \
  --pipeline-template=template.yaml \
  --parameter='input_data_path=gs://my-bucket/data/input.csv' \
  --parameter='training_epochs=50'

Question 29easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A data scientist wants to automate the retraining of a model when new data arrives in Cloud Storage. Which Google Cloud service is most appropriate for orchestrating this workflow?

Question 30mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An ML team is using Vertex AI Pipelines to automate model training and deployment. They want to reuse components across multiple pipelines. What is the best practice for managing component code?

Question 31hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company uses Vertex AI Pipelines to train and deploy models. The pipeline has a step that runs a custom container. The step fails intermittently with a timeout error. Which approach should be taken to robustly handle this?

Question 32easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An ML engineer is designing a CI/CD pipeline for ML models using Cloud Build and Cloud Deploy. They want to automatically test model performance on a validation set before promoting to production. Which step should be included in the CI/CD pipeline?

Question 33mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A team is using Cloud Composer to orchestrate ML workflows. They have a DAG that triggers a Vertex AI Training job, then a prediction deployment. The deployment step occasionally fails due to quota limits. What is the best way to handle this?

Question 34hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company uses Vertex AI Pipelines with Kubeflow DSL for hyperparameter tuning. They notice that some trials fail due to OOM errors. How should they configure the pipeline to automatically handle this?

Question 35easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An organization wants to implement continuous training for a model that serves predictions via Vertex AI Endpoints. Which approach best automates the retrain-deploy cycle?

Question 36mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

An ML engineer is using Cloud Build to trigger a Vertex AI Pipeline on every commit to a repository. The pipeline takes 2 hours. The engineer wants to only run the pipeline when changes are made to specific directories. How can this be achieved?

Question 37hardmultiple choice

Study the full Python automation breakdown →

A company uses Vertex AI Pipelines with prebuilt components for data processing, training, and deployment. They need to integrate a custom validation step written in Python. What is the correct way to include this as a component?

Question 38easymulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which TWO are benefits of using Vertex AI Pipelines for ML workflow orchestration over deploying custom Airflow DAGs in Cloud Composer? (Choose TWO.)

Question 39mediummulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which THREE are best practices for implementing CI/CD for ML pipelines on Google Cloud? (Choose THREE.)

Question 40hardmulti select

Read the full Automating and orchestrating ML pipelines explanation →

Which THREE should be considered when setting up an automated retraining pipeline using Vertex AI Pipelines and Cloud Composer? (Choose THREE.)

Question 41mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

The exhibit shows part of a Vertex AI Pipeline definition. The pipeline fails at the training step with an error: 'Missing required input: train_data'. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
# vertex_ai_pipeline.yaml
components:
  - name: data_processing
    container:
      image: us-central1-docker.pkg.dev/my-project/my-repo/data_processor:v1
      command: ["python", "process.py"]
  - name: training
    container:
      image: us-central1-docker.pkg.dev/my-project/my-repo/trainer:v2
      command: ["python", "train.py"]
    inputs:
      - name: train_data
        type: Dataset
    outputs:
      - name: model
        type: Model
  - name: evaluation
    container:
      image: us-central1-docker.pkg.dev/my-project/my-repo/evaluator:v1
    inputs:
      - name: model
        type: Model
      - name: test_data
        type: Dataset
    outputs:
      - name: metrics
        type: Metrics
```

Question 42hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A large financial company uses a complex ML pipeline to detect fraudulent transactions. The pipeline consists of multiple steps: data ingestion from Pub/Sub, feature engineering using Dataflow, model training with Vertex AI, and deployment to an endpoint. They currently use Cloud Composer to orchestrate the pipeline with separate DAGs for each step. Recently, they have been experiencing failures in the Dataflow job due to schema changes in the incoming transactions, causing the pipeline to stall. The team manually fixes the schema and re-runs the pipeline, which is time-consuming. They want to improve the robustness of the pipeline. The pipeline is run on a schedule but also triggered by the arrival of new data. The team is considering moving to Vertex AI Pipelines to unify the workflow. They also want to automatically detect schema changes and handle them without manual intervention. Which approach should they take?

Question 43easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A data science team uses Vertex AI Pipelines to build a training pipeline. They notice that when the pipeline fails due to a transient error in a component, the entire pipeline restarts from the beginning, taking a long time. What is the best practice to handle transient errors efficiently?

Question 44hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A company deploys a training pipeline on Vertex AI using custom containers. The pipeline includes a hyperparameter tuning job that uses Bayesian optimization. After several runs, they observe that the tuning job is not converging and the search space is large. They want to reduce the number of trials while still finding good hyperparameters. Which strategy should they use?

Question 45mediummulti select

Read the full Automating and orchestrating ML pipelines explanation →

A machine learning engineer is designing an ML pipeline on Vertex AI. The pipeline includes multiple steps: data validation, preprocessing, training, evaluation, and deployment. The engineer wants to ensure that if the data validation step fails due to schema mismatch, the pipeline stops immediately and does not proceed. Additionally, they want to reuse the preprocessed data from a previous successful run if the source data hasn't changed. Which two configurations should they use? (Choose two.)

Question 46easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

You are responsible for maintaining an ML pipeline that runs daily on Vertex AI Pipelines. The pipeline preprocesses data, trains a model, and deploys it to an endpoint. Recently, the pipeline has been failing at the deployment step because the endpoint already exists and the deploy step tries to create a new endpoint instead of updating the existing one. The pipeline code is written using the Kubeflow Pipelines SDK. You need to modify the pipeline to resolve this issue with minimal changes. What should you do?

Question 47mediummultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

Your team is developing a machine learning model for real-time fraud detection. The training pipeline runs on Vertex AI and uses BigQuery for feature engineering. Recently, the pipeline has been taking significantly longer to execute. Upon investigation, you find that the BigQuery query for feature extraction is being rerun every time the pipeline runs, even though the underlying data hasn't changed. The pipeline is scheduled to run every hour. You want to reduce cost and execution time without losing the ability to detect data drifts. Which approach should you take?

Question 48hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A large e-commerce company uses Vertex AI Pipelines to orchestrate its recommendation model training. The pipeline has several parallel components: feature engineering, model training, and model evaluation. Recently, they noticed that the pipeline often fails due to resource exhaustion in the Vertex AI custom training job for the model training component. The training job consumes significant memory and occasionally exceeds the allocated memory limit, causing the pod to be OOMKilled. The team has already increased the memory to the maximum allowed for the chosen machine type. They need to prevent the pipeline from failing while still using the same machine type. Which approach should they take?

Question 49mediummulti select

Read the full Automating and orchestrating ML pipelines explanation →

A company uses Cloud Scheduler to trigger Cloud Functions that submit Vertex AI training jobs. They want to ensure fault tolerance and minimize manual intervention. Which TWO practices should they implement?

Question 50hardmultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

Refer to the exhibit. A ML engineer runs this Vertex AI pipeline. After execution, the "train" task fails with a resource exhaustion error. The task consumes more memory than allocated. Which step should the engineer take to fix this issue without increasing the overall quota cost?

Exhibit

{
  "pipelineJob": {
    "pipelineSpec": {
      "pipelineInfo": {"name": "training-pipeline"},
      "root": {
        "dag": {
          "tasks": {
            "preprocess": {
              "taskInfo": {"name": "preprocess"},
              "componentRef": {"name": "data-processing"},
              "inputs": {"data": {"artifacts": [{"name": "raw_data", "type": "Dataset"}]}}
            },
            "train": {
              "taskInfo": {"name": "train"},
              "componentRef": {"name": "trainer"},
              "inputs": {"dataset": {"taskOutputArtifact": {"taskName": "preprocess", "outputKey": "processed_data"}}},
              "dependentTasks": ["preprocess"],
              "executorLabel": "train-exec"
            }
          }
        }
      }
    },
    "runtimeConfig": {
      "gcsOutputDirectory": "gs://my-bucket/pipeline-output",
      "parameterValues": {
        "learning_rate": 0.01,
        "epochs": 10
      },
      "inputArtifacts": {
        "raw_data": {
          "gcsSourceArtifact": {
            "artifacts": [{"uri": "gs://my-bucket/data/raw.csv"}]
          }
        }
      }
    }
  }
}

Question 51easymultiple choice

Read the full Automating and orchestrating ML pipelines explanation →

A pharmaceutical company uses Vertex AI Pipelines with custom training containers. Recently, the pipeline has been failing with 'Container failed with exit code 137' (out of memory). The container runs with default memory limit. The team needs to fix this without changing the code. The project quota for CPU and memory is sufficient. What should the team do?

Refer to the exhibit. ``` symptom: Cloud Build trigger fails with: Build failed: could not resolve source: fetching source: fetching storage object: object not found trigger configuration: event: push to branch main repository: my-repo included files: 'train/**' excluded files: 'test/**' source: gs://my-bucket/source.tar.gz ```

gcloud ai pipelines submit \ --project=my-project \ --region=us-central1 \ --pipeline-job-name=training-pipeline-20231001 \ --pipeline-template=template.yaml \ --parameter='input_data_path=gs://my-bucket/data/input.csv' \ --parameter='training_epochs=50'

Refer to the exhibit. ``` # vertex_ai_pipeline.yaml components: - name: data_processing container: image: us-central1-docker.pkg.dev/my-project/my-repo/data_processor:v1 command: ["python", "process.py"] - name: training container: image: us-central1-docker.pkg.dev/my-project/my-repo/trainer:v2 command: ["python", "train.py"] inputs: - name: train_data type: Dataset outputs: - name: model type: Model - name: evaluation container: image: us-central1-docker.pkg.dev/my-project/my-repo/evaluator:v1 inputs: - name: model type: Model - name: test_data type: Dataset outputs: - name: metrics type: Metrics ```

{ "pipelineJob": { "pipelineSpec": { "pipelineInfo": {"name": "training-pipeline"}, "root": { "dag": { "tasks": { "preprocess": { "taskInfo": {"name": "preprocess"}, "componentRef": {"name": "data-processing"}, "inputs": {"data": {"artifacts": [{"name": "raw_data", "type": "Dataset"}]}} }, "train": { "taskInfo": {"name": "train"}, "componentRef": {"name": "trainer"}, "inputs": {"dataset": {"taskOutputArtifact": {"taskName": "preprocess", "outputKey": "processed_data"}}}, "dependentTasks": ["preprocess"], "executorLabel": "train-exec" } } } } }, "runtimeConfig": { "gcsOutputDirectory": "gs://my-bucket/pipeline-output", "parameterValues": { "learning_rate": 0.01, "epochs": 10 }, "inputArtifacts": { "raw_data": { "gcsSourceArtifact": { "artifacts": [{"uri": "gs://my-bucket/data/raw.csv"}] } } } } } }