Knowledge + Practice

CCNA Pmle Ml Pipelines Questions

75 of 89 questions · Page 1/2 · Pmle Ml Pipelines topic · Answers revealed

Practice these questions Exam hub All questions

1

Multi-Selecthard

You are tasked with building a robust ML pipeline that must be idempotent and handle data skew between training and serving. Which three practices should you implement?

Select 3 answers

A.Store intermediate data in Cloud Storage with unique run IDs.

B.Pass large datasets between components as serialized in-memory objects.

C.Monitor feature distributions in training data vs. serving data to detect skew.

D.Use the same random seed for every run to ensure reproducibility.

E.Ensure each component produces deterministic outputs given the same inputs.

AnswersA, C, E

Unique paths prevent collisions and support idempotency.

Why this answer

Idempotent components ensure the same inputs produce the same outputs. Passing data via GCS URIs is a best practice. Skew detection should compare training data distribution with serving data.

Using unique run IDs for outputs ensures idempotency. Avoiding in-memory data passing is important for large datasets.

Practice this question →

2

Multi-Selectmedium

A pipeline uses the Google Cloud Pipeline Components to perform AutoML training and batch prediction. Which two components from the GCPC library should they use? (Choose two.)

Select 2 answers

A.CustomJobRunOp

B.DataflowPythonOp

C.AutoMLTabularTrainingJobRunOp

D.BatchPredictOp

E.EndpointPredictOp

AnswersC, D

This component runs an AutoML training job for tabular data.

Why this answer

AutoMLTabularTrainingJobRunOp is for AutoML training on tabular data, and BatchPredictOp is for batch predictions. Other options are for custom training or online prediction.

Practice this question →

3

Multi-Selecteasy

A data scientist is creating a Vertex AI pipeline using the Kubeflow Pipelines SDK v2. Which TWO statements about pipeline parameters are correct? (Choose two.)

Select 2 answers

A.Pipeline parameters are defined as inputs to the pipeline function decorated with @dsl.pipeline.

B.Pipeline parameters must be serialized to JSON before use.

C.Pipeline parameters can only be of type str.

D.Pipeline parameters can be overridden at pipeline run time.

E.Pipeline parameters can be used to pass large datasets between components.

AnswersA, D

Correct: Parameters are function arguments of the pipeline function.

Why this answer

Option A is correct because in the Kubeflow Pipelines SDK v2, pipeline parameters are explicitly defined as input arguments to the pipeline function that is decorated with @dsl.pipeline. These parameters serve as the primary mechanism for passing configuration values (e.g., model name, learning rate, number of epochs) into the pipeline at creation time and can be consumed by downstream components.

Exam trap

The trap is that candidates often assume pipeline parameters must be JSON-serialized or limited to strings due to older Kubeflow v1 conventions, but Vertex AI's Kubeflow Pipelines SDK v2 natively supports multiple Python types and automatic serialization.

Practice this question →

4

Multi-Selectmedium

Your organization wants to automate the retraining of a model when new data is available and also on a weekly schedule. Which TWO services would you use together to achieve this? (Choose two.)

Select 2 answers

A.Cloud Functions

B.Cloud Composer

C.Cloud Tasks

D.Dataflow

E.Cloud Scheduler

AnswersA, E

For event-driven trigger on new data.

Why this answer

Cloud Scheduler (E) is used to trigger the retraining on a weekly schedule by sending a message to a Pub/Sub topic or making an HTTP request. Cloud Functions (A) is the serverless compute service that executes the retraining code in response to that trigger, and it can also be triggered directly when new data arrives (e.g., via Cloud Storage or Pub/Sub). Together, they provide both event-driven and scheduled automation without managing infrastructure.

Exam trap

Google often tests the distinction between orchestration (Cloud Composer) and simple scheduling/event-driven triggers (Cloud Scheduler + Cloud Functions), leading candidates to over-engineer the solution by choosing Cloud Composer when a lightweight combination suffices.

Practice this question →

5

MCQeasy

A company wants to automatically retrain their model every night at 2 AM using Vertex AI Pipelines. Which approach should they use to trigger the pipeline on a schedule?

A.Use Cloud Scheduler to call the Vertex AI pipeline creation API

B.Deploy the pipeline as a Cloud Run job with a cron trigger

C.Use Vertex AI Experiments to schedule runs

D.Configure a cron job inside the pipeline definition

AnswerA

Cloud Scheduler can invoke a Cloud Function that creates a pipeline run at the specified time.

Why this answer

Cloud Scheduler is the correct approach because it can directly invoke the Vertex AI Pipeline creation API via an HTTP trigger at a specified cron schedule (e.g., 2 AM daily). This integrates natively with Vertex AI's pipeline orchestration, allowing the scheduler to submit a pipeline run without additional infrastructure. The other options either lack native Vertex AI pipeline support or introduce unnecessary complexity.

Exam trap

A common mistake is confusing scheduling a pipeline run (using Cloud Scheduler + Vertex AI API) with scheduling tasks inside a pipeline (using cron within the pipeline definition). Neither Vertex AI Experiments nor Cloud Run jobs are designed for scheduled pipeline orchestration.

How to eliminate wrong answers

Option B is wrong because Cloud Run jobs are designed for stateless container execution and do not natively support Vertex AI Pipelines; they would require custom code to call the API, adding overhead and breaking the managed pipeline lifecycle. Option C is wrong because Vertex AI Experiments is used for tracking and comparing model training runs, not for scheduling or triggering pipeline executions. Option D is wrong because a cron job inside the pipeline definition would only schedule tasks within a single pipeline run, not trigger the pipeline itself on a recurring schedule.

Practice this question →

6

Multi-Selecthard

A company is using Vertex AI Pipelines for ML workflows. They want to implement best practices for idempotent components and data passing. Which THREE practices should they adopt?

Select 3 answers

A.Pass large datasets between components using GCS URIs instead of in-memory values.

B.Avoid hard-coding file paths; use pipeline parameters to pass URIs.

C.Read data into memory in the first component and pass the in-memory object to subsequent components.

D.Use global variables in the pipeline code to store intermediate results.

E.Design components to be idempotent so that the same input always produces the same output.

AnswersA, B, E

GCS URIs allow for scalable, cacheable data passing.

Why this answer

Option A is correct because Vertex AI Pipelines components run in isolated containers; passing large datasets in-memory would exceed memory limits and cause failures. Using GCS URIs allows components to read/write data directly from Cloud Storage, which is the recommended pattern for handling large artifacts in Kubeflow Pipelines (the underlying orchestrator). This approach also enables caching and parallel execution since components only depend on the URI, not on the state of previous containers.

Exam trap

Cisco often tests the misconception that in-memory data passing is acceptable in containerized pipelines, but the correct pattern is to use persistent storage (GCS) and artifact URIs to ensure idempotency and scalability.

Practice this question →

7

MCQmedium

An ML engineer is building a pipeline that includes a step to run a BigQuery query and pass the results to the next step. They want to use a pre-built Google Cloud Pipeline Component for BigQuery. Which component should they use to execute a query and output the results to a destination table?

A.BigQueryExecuteQuery

B.BigQueryExportData

C.BigQueryQueryJobOp

D.BigqueryRunQuery

AnswerA

This component executes a SQL query and writes results to a destination table.

Why this answer

The correct component is BigQueryExecuteQuery because it is the pre-built Google Cloud Pipeline Component (from the `google-cloud-pipeline-components` package) specifically designed to run a BigQuery SQL query and write the results to a destination table. It returns a `DatasetArtifact` or `Table` artifact that can be passed to downstream pipeline steps, fulfilling the requirement of outputting results to a destination table.

Exam trap

The trap here is that candidates confuse the generic BigQuery client method `query()` or the legacy `BigQueryQueryJobOp` with the official pipeline component `BigQueryExecuteQuery`, which is the only one that properly integrates with Kubeflow Pipelines artifact passing and is the recommended approach in the PMLE exam domain.

How to eliminate wrong answers

Option B (BigQueryExportData) is wrong because it exports data from a BigQuery table to external storage (e.g., Cloud Storage, Drive) rather than executing a query and writing results to a destination table. Option C (BigQueryQueryJobOp) is wrong because it is a legacy component from the `kfp.gcp` module that uses the `google-cloud-bigquery` client directly but does not output a structured artifact for pipeline orchestration; it is deprecated in favor of `BigQueryExecuteQuery`. Option D (BigqueryRunQuery) is wrong because it is not a valid component name in the official Google Cloud Pipeline Components library; the correct casing and naming is `BigQueryExecuteQuery`.

Practice this question →

8

MCQmedium

A company wants to implement continuous training for their ML model. The pipeline should be triggered when new training data arrives in Cloud Storage, and after training, the model should be automatically deployed to a staging endpoint if evaluation metrics pass a threshold. They also need to detect skew between training data and serving data. Which two services should they use for skew detection?

A.Cloud Monitoring and Cloud Logging

B.Cloud DLP and Cloud KMS

C.Vertex AI Model Monitoring and Vertex AI Pipelines

D.BigQuery and Dataflow

AnswerC

Correct: Vertex AI Model Monitoring detects skew, and pipelines orchestrate the process.

Why this answer

Vertex AI Model Monitoring provides built-in skew detection by comparing training data statistics with serving data statistics, alerting when distribution shifts exceed thresholds. Vertex AI Pipelines orchestrates the continuous training workflow, including triggering on new data arrival, model evaluation, and conditional deployment to a staging endpoint, making C the correct pair for both skew detection and pipeline automation.

Exam trap

Google often tests the distinction between general-purpose monitoring/logging services (Cloud Monitoring/Logging) and ML-specific monitoring (Vertex AI Model Monitoring), leading candidates to pick A because they think 'monitoring' means the same thing.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring and Cloud Logging are used for infrastructure and application monitoring (metrics, logs, alerts), not for statistical skew detection between training and serving data distributions. Option B is wrong because Cloud DLP (Data Loss Prevention) and Cloud KMS (Key Management Service) handle data security, masking, and encryption, not model monitoring or data skew analysis. Option D is wrong because BigQuery and Dataflow are data processing and analytics services; while they can compute statistics, they lack the specialized skew detection, alerting, and integration with Vertex AI endpoints that Vertex AI Model Monitoring provides natively.

Practice this question →

9

Multi-Selectmedium

A data science team is designing a Vertex AI pipeline that includes a loop over a list of hyperparameter sets. They want to run training jobs in parallel for each hyperparameter set and then collect the results for comparison. Which two Kubeflow Pipelines SDK v2 features should they use? (Choose two.)

Select 2 answers

A.dsl.ParallelFor

B.dsl.Collected

C.dsl.If

D.dsl.ExitHandler

E.dsl.importer

AnswersA, B

dsl.ParallelFor iterates over items and runs tasks in parallel.

Why this answer

Option A is correct because `dsl.ParallelFor` is the Kubeflow Pipelines SDK v2 feature that enables iterating over a list of hyperparameter sets and executing the training tasks in parallel. This directly supports the team's requirement to run multiple training jobs concurrently for each hyperparameter configuration.

Exam trap

The trap here is that candidates may confuse `dsl.ParallelFor` with `dsl.If` for conditional logic, or mistakenly think `dsl.importer` can handle result collection, when in fact only `dsl.ParallelFor` and `dsl.Collected` together provide the parallel iteration and result aggregation needed for this use case.

Practice this question →

10

MCQeasy

A machine learning engineer is using Vertex AI Pipelines and wants to run a custom Python function as a component. They need to pass a dataset artifact from a previous component and output a model artifact. Which decorator should they use to define the component in the Kubeflow Pipelines SDK v2?

A.@dsl.task

B.@dsl.pipeline

C.@dsl.component

D.@dsl.container

AnswerC

Correct decorator for defining a Python function component with typed inputs/outputs.

Why this answer

The correct decorator is @dsl.component because in Kubeflow Pipelines SDK v2, this decorator is used to define a custom Python function as a reusable pipeline component. It automatically handles input and output artifact serialization, such as passing a dataset artifact from a previous component and outputting a model artifact, by leveraging the component's type annotations and the KFP artifact system.

Exam trap

Cisco often tests the distinction between @dsl.component (for custom Python functions with artifact I/O) and @dsl.container (for pre-built container images), leading candidates to confuse the two when the question emphasizes running a custom Python function.

How to eliminate wrong answers

Option A is wrong because @dsl.task is not a valid decorator in Kubeflow Pipelines SDK v2; it is a concept from Vertex AI custom jobs, not for defining pipeline components. Option B is wrong because @dsl.pipeline is used to define the entire pipeline graph, not an individual component function. Option D is wrong because @dsl.container is used to define a component that runs a container image directly, not a custom Python function with artifact handling.

Practice this question →

11

Multi-Selecthard

A company wants to implement a CI/CD pipeline for their ML models using Vertex AI. They need to automatically retrain the model when new data arrives, but only if the model performance on a validation set has degraded by more than 5% compared to the current production model. Which three services or components should they incorporate into the automated pipeline? (Choose three.)

Select 3 answers

A.Dataflow pipeline to clean the new data before training

B.Vertex AI Evaluation component to compute model performance metrics on the validation set

C.Cloud Functions to trigger the pipeline when new data arrives in Cloud Storage

D.Vertex AI Model Registry alias update to promote the model if performance passes the threshold

E.Cloud Scheduler to run the pipeline on a fixed schedule

AnswersB, C, D

Evaluation is needed to compare against the production model.

Why this answer

Option B is correct because Vertex AI Evaluation component can be used within a pipeline to compute model performance metrics (e.g., accuracy, precision, recall) on a validation set. This allows the pipeline to compare the newly trained model's performance against the current production model's performance, enabling the conditional logic to check if degradation exceeds 5%.

Exam trap

In Google Cloud, the distinction between event-driven triggers (Cloud Functions/Eventarc) and scheduled triggers (Cloud Scheduler) is commonly tested. Candidates often mistakenly choose Cloud Scheduler when the requirement is for an event-driven retraining pipeline triggered by new data arrival.

Practice this question →

12

MCQmedium

An ML engineer is building a pipeline component that takes a dataset URI and a model URI as inputs, and outputs a classification metrics artifact. Which KFP SDK v2 type should the output artifact be annotated with?

A.Dataset

B.Metrics

C.ClassificationMetrics

D.Model

AnswerC

This is the correct artifact type for classification evaluation metrics.

Why this answer

In KFP SDK v2, the `ClassificationMetrics` type is specifically designed to output classification metrics such as confusion matrix, ROC curve, and AUC. The question asks for a component that outputs classification metrics, so `ClassificationMetrics` is the correct artifact type. Using `Metrics` would be too generic and not provide the structured schema needed for classification-specific visualizations in the KFP UI.

Exam trap

The trap here is that candidates confuse the generic `Metrics` type (which handles scalar values) with the specialized `ClassificationMetrics` type, not realizing that KFP SDK v2 requires the specific artifact type to enable proper UI rendering and schema validation for classification outputs.

How to eliminate wrong answers

Option A is wrong because `Dataset` is used for input or output of tabular data, not for metrics artifacts. Option B is wrong because `Metrics` is a generic artifact for scalar metrics (e.g., accuracy, loss) but lacks the structured fields (e.g., confusion matrix, ROC) required for classification metrics; it would not render classification-specific visualizations in the KFP UI. Option D is wrong because `Model` is used for serialized model artifacts, not for evaluation metrics.

Practice this question →

13

MCQmedium

An organization runs a Vertex AI pipeline that includes a model evaluation step. Team members want to reuse previously computed evaluation metrics when re-running the pipeline with unchanged code and hyperparameters. Which feature should they enable?

A.Manually store outputs in Cloud Storage and check for existence

B.Enable pipeline caching (default behavior)

C.Use the importer component to fetch previous results

D.Disable caching for the evaluation component

AnswerB

Caching is enabled by default; unchanged components automatically reuse cached outputs.

Why this answer

Vertex AI Pipelines automatically caches component outputs based on a cache key derived from the component image, code, and input parameters. If the cache key matches a previous run, the cached output is reused, saving time and cost.

Practice this question →

14

MCQhard

A machine learning pipeline includes a conditional branch: if model accuracy exceeds 0.95, deploy to production; otherwise, send a notification. Which KFP SDK feature allows implementing this logic within the pipeline definition?

A.dsl.ParallelFor

B.Using a Python if statement inside the component

C.dsl.If

D.dsl.Condition

AnswerC

dsl.If creates a conditional branch in the pipeline based on a condition.

Why this answer

C is correct because KFP SDK's `dsl.If` is the native construct for implementing conditional branching within a pipeline definition, allowing you to conditionally execute a task (like deploying to production) based on the output of a previous component (e.g., model accuracy > 0.95). It works by creating a `Condition` object that wraps the downstream tasks, ensuring the pipeline DAG is correctly compiled and executed only when the condition evaluates to true.

Exam trap

Cisco often tests the distinction between `dsl.If` and `dsl.Condition` (which does not exist) to catch candidates who confuse the KFP API with generic programming concepts or other frameworks like Apache Airflow.

How to eliminate wrong answers

Option A is wrong because `dsl.ParallelFor` is used for iterating over a collection of items to execute tasks in parallel, not for conditional branching. Option B is wrong because using a Python `if` statement inside a component would execute at component runtime, not at pipeline definition time, and KFP does not support dynamic branching based on runtime values within the pipeline DAG definition itself. Option D is wrong because `dsl.Condition` is not a valid KFP SDK API; the correct class is `dsl.If`.

Practice this question →

15

MCQhard

A team has a pipeline that trains a model and then evaluates it. They want to conditionally deploy the model to a staging endpoint only if evaluation metrics exceed a threshold. Which KFP feature should they use?

A.Use dsl.Condition (deprecated) or dsl.If to check metrics and conditionally run deployment.

B.Use dsl.ParallelFor to evaluate and deploy in parallel.

C.Use an exit handler to deploy regardless of metrics.

D.Split the pipeline into two separate pipelines and run the second only if metrics are good.

AnswerA

dsl.If is the correct way to add conditional logic in KFP v2.

Why this answer

Option A is correct because KFP provides `dsl.Condition` (deprecated) and `dsl.If` as first-class pipeline constructs to conditionally execute pipeline components based on runtime metrics or other pipeline outputs. By wrapping the deployment step inside a `dsl.If` block that checks whether evaluation metrics exceed a threshold, the pipeline can deploy the model to a staging endpoint only when the condition is met, avoiding unnecessary deployments for underperforming models.

Exam trap

Google often tests the distinction between conditional execution (`dsl.If`) and unconditional execution patterns (exit handlers, parallel loops), tempting candidates to choose a pattern that always runs the deployment step or runs it in parallel without any gate.

How to eliminate wrong answers

Option B is wrong because `dsl.ParallelFor` is designed for iterating over a collection of items to execute the same component in parallel, not for conditionally executing a component based on a runtime evaluation result. Option C is wrong because an exit handler (e.g., `dsl.ExitHandler`) always runs a specified component when the pipeline exits, regardless of success or failure, so it would deploy the model even if metrics are poor, which contradicts the requirement. Option D is wrong because splitting the pipeline into two separate pipelines loses the benefit of a single orchestrated workflow; it introduces manual coordination, external state management, and additional operational complexity, whereas KFP’s conditional constructs handle this natively within one pipeline.

Practice this question →

16

MCQmedium

An ML engineer is using Cloud Composer (Airflow) to orchestrate a ML workflow. They need to run a Vertex AI pipeline as one of the tasks in the DAG. Which Airflow operator should they use?

A.BigQueryOperator to run the pipeline as a query.

B.VertexAIPipelineRunOperator or the Google Cloud Pipeline operator.

C.PythonOperator with a custom script using the google-cloud-aiplatform library.

D.VertexAIPipelineRunOperator (or Airflow's GCSToGCSOperator) for pipeline orchestration.

AnswerB

These operators are designed to run Vertex AI pipelines from Airflow.

Why this answer

Option B is correct because Cloud Composer (Airflow) natively supports the `VertexAIPipelineRunOperator` (or its alias `GoogleCloudPipelineOperator`), which is specifically designed to trigger and monitor a Vertex AI pipeline run as a task within a DAG. This operator handles authentication, pipeline job submission, and status polling without requiring custom code, making it the idiomatic choice for orchestrating Vertex AI pipelines from Airflow.

Exam trap

The trap here is that candidates may confuse a general-purpose operator (like PythonOperator or GCSToGCSOperator) with a purpose-built operator, or incorrectly assume that BigQueryOperator can be repurposed for pipeline execution, when the exam expects knowledge of the specific Airflow operator designed for Vertex AI pipeline orchestration.

How to eliminate wrong answers

Option A is wrong because BigQueryOperator is used to execute BigQuery SQL queries or jobs, not to run Vertex AI pipelines; it has no capability to submit or manage a Vertex AI pipeline run. Option C is wrong because while a PythonOperator with a custom script using the google-cloud-aiplatform library could technically work, it is not the recommended or native Airflow operator—it requires manual handling of authentication, polling, and error handling, and it bypasses Airflow's built-in integration and retry mechanisms. Option D is wrong because GCSToGCSOperator is a data transfer operator for copying files between GCS buckets, not for pipeline orchestration; the mention of 'VertexAIPipelineRunOperator' is correct, but pairing it with GCSToGCSOperator as an alternative for pipeline orchestration is incorrect and misleading.

Practice this question →

17

Multi-Selecthard

A team wants to implement CI/CD for their ML pipeline using Cloud Build. They want to automatically compile and deploy the pipeline when code is pushed to the main branch. Which three steps should they include in the Cloud Build configuration? (Choose three.)

Select 3 answers

A.Create or update the pipeline in Vertex AI using the compiled file

B.Upload the compiled pipeline to Cloud Storage

C.Run the pipeline immediately after deployment

D.Install KFP SDK and compile the pipeline

E.Configure Cloud Scheduler to trigger on push

AnswersA, B, D

Use gcloud or Python client to register the pipeline in Vertex AI.

Why this answer

Option A is correct because the Cloud Build configuration should include a step to create or update the pipeline in Vertex AI using the compiled file. This step registers the pipeline definition with Vertex AI Pipelines, making it available for execution and versioning. Without this, the compiled pipeline would not be accessible for deployment or scheduling within the Vertex AI environment.

Exam trap

A common trap is confusing build-time actions (compilation, upload, registration) with runtime actions (execution, scheduling). Candidates often mistakenly include immediate pipeline execution as a CI/CD step instead of focusing on deploying and registering the pipeline artifact.

Practice this question →

18

Multi-Selecthard

A team is designing a ML pipeline that includes training, evaluation, and conditional deployment. They want to use Vertex AI Pipelines. Which THREE concepts should they use? (Choose three.)

Select 3 answers

A.Artifact types (e.g., Model, Metrics) for passing outputs

B.Manual approval via Cloud Console

C.Cloud SQL for storing intermediate results

D.Pre-built Google Cloud Pipeline Components for training and evaluation

E.dsl.If for conditional execution

AnswersA, D, E

Artifacts enable proper tracking and lineage.

Why this answer

dsl.If for conditionals, Artifacts for passing model/data, and pre-built Vertex AI components for training.

Practice this question →

19

Multi-Selectmedium

A machine learning team uses Vertex AI Pipelines to run a multi-step training pipeline. They want to implement a continuous delivery (CD) process where a model is automatically promoted from staging to production only if it passes an evaluation gate. Which TWO actions should they include in their CI/CD pipeline? (Choose two.)

Select 2 answers

A.Use Cloud Functions to periodically check for new model versions and deploy.

B.Store model versions in Vertex AI Model Registry with version aliases (e.g., 'staging', 'production').

C.Manually approve each model version before deployment.

D.Use Cloud Build to trigger the pipeline on new model code commits.

E.Deploy every model version directly to production without evaluation.

AnswersB, D

Correct: Model Registry with aliases enables controlled promotion across environments.

Why this answer

Option B is correct because Vertex AI Model Registry supports version aliases like 'staging' and 'production', which allow the CI/CD pipeline to automatically promote a model from staging to production only after it passes the evaluation gate. This enables a controlled, automated CD process without manual intervention.

Exam trap

In Google Cloud, the distinction between automated promotion using model registry aliases (e.g., 'staging', 'production') versus manual approval or polling-based triggers is key. Vertex AI Model Registry aliases enable seamless, event-driven CD without external polling or human intervention.

Practice this question →

20

MCQmedium

An ML team wants to run a hyperparameter tuning job on Vertex AI using a pre-built pipeline component. Which component should they use?

A.AutoMLTabularTrainingJobRunOp

B.CustomTrainingJobRunOp with hyperparameter arguments.

C.ModelTrainComponent

D.HyperparameterTuningJobRunOp

AnswerD

This is the correct pre-built component for tuning.

Why this answer

The HyperparameterTuningJobRunOp is the correct pre-built Vertex AI pipeline component specifically designed to launch a hyperparameter tuning job. It wraps the Vertex AI HyperparameterTuningJob API, allowing you to specify the worker pool spec, metric target, and parameter specifications directly within a Kubeflow Pipelines (KFP) or Vertex AI Pipelines orchestration context.

Exam trap

A common mistake on the Google PMLE exam is confusing the pre-built HyperparameterTuningJobRunOp with CustomTrainingJobRunOp that accepts hyperparameter arguments, but the latter requires manual tuning logic rather than leveraging the built-in hyperparameter tuning service.

How to eliminate wrong answers

Option A is wrong because AutoMLTabularTrainingJobRunOp is used to launch an AutoML training job for tabular data, which does not support custom hyperparameter tuning; it uses AutoML's own search. Option B is wrong because CustomTrainingJobRunOp with hyperparameter arguments is not a pre-built component for tuning; it launches a single custom training job and would require manual orchestration to implement a tuning loop, whereas the question asks for a pre-built component. Option C is wrong because ModelTrainComponent is not a standard pre-built Vertex AI pipeline component; it is a generic name that does not correspond to any official Vertex AI component, and using it would require custom implementation.

Practice this question →

21

Multi-Selecteasy

An organization wants to implement continuous delivery for their ML model. After a new model is trained and evaluated, they want to automatically deploy it to a staging endpoint, run validation tests, and if passed, promote to production. Which two components should they include in their delivery pipeline? (Choose two.)

Select 2 answers

A.Conditional component that checks evaluation metrics and promotes if successful

B.Dataflow job to preprocess data

C.ModelDeploymentOp to deploy to staging

D.Importer component to bring in the model

E.A/B testing component to split traffic

AnswersA, C

A conditional gate based on metrics decides whether to promote to production.

Why this answer

Option A is correct because a conditional component that checks evaluation metrics (e.g., accuracy, precision, recall) against a predefined threshold is essential for automated promotion. This component acts as a gate, ensuring only models that meet quality criteria are promoted to production, which is a core requirement for continuous delivery in ML pipelines.

Exam trap

In Google Cloud ML pipelines, the key distinction is between deployment components (like ModelDeploymentOp in Vertex AI Pipelines) and data processing components (like Dataflow jobs). Candidates often mistakenly include preprocessing steps in the deployment pipeline instead of focusing on the deployment and validation logic.

Practice this question →

22

Multi-Selectmedium

An organization is building a continuous training (CT) pipeline that retrains a model whenever one of the following conditions is met: (1) new training data is available in Cloud Storage, (2) it's the first day of the month, or (3) the model's performance degrades below a threshold. Which TWO mechanisms should they combine to trigger the pipeline?

Select 2 answers

A.Cloud Scheduler to trigger the pipeline on a cron schedule (e.g., first day of month)

B.Monitoring alerts from Cloud Monitoring to trigger a Cloud Function that starts the pipeline

C.Cloud Build trigger on code push to repository

D.BigQuery scheduled query to trigger the pipeline after data transformation

E.Cloud Storage Pub/Sub notifications to trigger a Cloud Function that starts the pipeline

AnswersA, E

Correct. Cloud Scheduler can trigger the pipeline on a cron schedule, satisfying condition (2) (first day of month).

Why this answer

Option A is correct because Cloud Scheduler can trigger the pipeline on a cron schedule, such as the first day of every month, directly satisfying condition (2). Option E is correct because Cloud Storage Pub/Sub notifications can be configured to fire a Cloud Function whenever new training data is uploaded, satisfying condition (1). Condition (3), performance degradation below a threshold, is not directly covered by any of the provided options; however, it could be handled separately via a custom solution like Vertex AI Model Monitoring that triggers the pipeline.

Among the given choices, A and E are the best combination for the conditions that have direct native support.

Exam trap

Candidates often confuse Cloud Monitoring (infrastructure metrics) with model performance monitoring. Cloud Monitoring does not natively evaluate model metrics like accuracy or F1 score, so using it to trigger retraining on performance degradation is incorrect.

Practice this question →

23

MCQeasy

A machine learning engineer is building a Vertex AI pipeline that uses a pre-built Google Cloud Pipeline Components (GCPC) to train a custom model. Which component should the engineer use to submit a custom training job to Vertex AI?

A.HyperparameterTuningJob

B.CustomJob

C.BatchPredictionJob

D.ModelDeploy

AnswerB

Correct: CustomJob (or TrainingJob) is the GCPC component to run a custom training job on Vertex AI.

Why this answer

The CustomJob component is the correct choice because it is the pre-built GCPC component specifically designed to submit a custom training job to Vertex AI. It allows the engineer to specify a custom container image or a Python training script, along with machine configuration and hyperparameters, directly within a Vertex AI pipeline. Other components serve different purposes, such as hyperparameter tuning, batch predictions, or model deployment.

Exam trap

The trap here is that candidates may confuse HyperparameterTuningJob with CustomJob because both involve training, but HyperparameterTuningJob is for multi-trial optimization, not a single training run, and ModelDeploy is a distractor that does not exist as a GCPC component.

How to eliminate wrong answers

Option A is wrong because HyperparameterTuningJob is used for optimizing hyperparameters across multiple trials, not for submitting a single custom training job. Option C is wrong because BatchPredictionJob is for running batch predictions on a trained model, not for training. Option D is wrong because ModelDeploy is not a standard GCPC component; the correct component for deploying a model to an endpoint is ModelDeployer or a similar deployment component, and ModelDeploy does not exist in the GCPC library.

Practice this question →

24

MCQhard

A Vertex AI Pipeline contains a task that may produce outputs that are not always needed. The engineer wants to conditionally execute downstream tasks only if a specific artifact is produced. Which KFP SDK v2 construct allows the engineer to implement this conditional execution?

A.dsl.If

B.dsl.ExitHandler

C.dsl.Collected

D.dsl.Condition

AnswerA

dsl.If allows conditional branching based on task outputs or pipeline parameters.

Why this answer

The dsl.If construct in KFP SDK v2 allows conditional execution of tasks based on the presence or value of an output from a previous task. dsl.Collected is for parallel loops, exit handlers are for cleanup, and dsl.Condition is not a valid construct.

Practice this question →

25

MCQmedium

You are using KFP SDK v2 to define a pipeline. You need to pass a large dataset between components. What is the best practice for passing data?

A.Use the component's temporary directory to share data between containers.

B.Pass the data as a serialized Python object in memory.

C.Write the data to Cloud Storage and pass the GCS URI as an artifact.

D.Store the data in a BigQuery table and pass the table reference.

AnswerC

This is the recommended best practice for large data in KFP pipelines.

Why this answer

In KFP SDK v2, passing large datasets between components is best done by writing the data to Cloud Storage and passing the GCS URI as an artifact. This approach leverages KFP's built-in artifact tracking, ensures data persistence across container restarts, and avoids memory or disk limitations of ephemeral containers. The artifact is automatically serialized and passed as an input/output parameter, enabling efficient, scalable data exchange.

Exam trap

Google often tests the misconception that temporary directories are shared between containers in a pod, but in KFP each component runs in its own container with isolated storage, making Cloud Storage the correct choice for durable, cross-component data sharing.

How to eliminate wrong answers

Option A is wrong because a component's temporary directory is ephemeral and not shared between containers; each container runs in its own isolated filesystem, so data written there is lost after the component finishes. Option B is wrong because passing a serialized Python object in memory is limited by the container's memory capacity and cannot handle large datasets; KFP does not support in-memory object passing between components. Option D is wrong because storing data in a BigQuery table and passing the table reference is overkill for intermediate pipeline data; it introduces unnecessary latency, cost, and complexity compared to using Cloud Storage artifacts, which are the standard for KFP artifact passing.

Practice this question →

26

Multi-Selecthard

A company runs a Vertex AI pipeline that uses a container component to preprocess data. The component downloads a large file from a public URL and saves the output to Cloud Storage. The pipeline fails intermittently with a 'timeout' error. Which THREE steps should the team take to improve reliability? (Choose three.)

Select 3 answers

A.Make the component idempotent by checking for existing output before processing.

B.Increase the component's timeout setting.

C.Reduce the size of the file being downloaded.

D.Implement retries with exponential backoff in the component.

E.Increase the machine type for the component.

AnswersA, B, D

Correct: Idempotency ensures that if a retry occurs, it doesn't cause duplicate work.

Why this answer

Option A is correct because making the component idempotent by checking for existing output before processing prevents redundant work and avoids timeout failures when the file has already been downloaded. In Vertex AI pipelines, idempotent components can safely skip processing if the output already exists in Cloud Storage, reducing the risk of hitting timeout limits on subsequent pipeline runs.

Exam trap

Google exams often test the distinction between fixing the symptom (increasing timeout) and addressing the root cause (idempotency and retries), leading candidates to overlook that idempotency and retries together with a reasonable timeout form the most robust solution.

Practice this question →

27

MCQhard

A machine learning team uses Vertex AI Pipelines to orchestrate their training pipeline. They want to trigger the pipeline automatically in response to new data arriving in a Cloud Storage bucket, and also support a scheduled run every day at 6 AM. Which combination of services should they use to achieve both event-driven and schedule-based triggers?

A.Cloud Scheduler for the schedule, and Cloud Pub/Sub with Push subscription to Vertex AI for event-driven.

B.Cloud Functions for both schedule and event-driven, using cron trigger.

C.Cloud Scheduler for the schedule, and Cloud Functions triggered by Cloud Storage events to call the Vertex AI API for event-driven.

D.Vertex AI Pipelines built-in scheduler for schedule, and Cloud Pub/Sub for event-driven.

AnswerC

Correct: Cloud Scheduler for cron schedule, Cloud Functions for event-driven from Cloud Storage.

Why this answer

Option C is correct because Cloud Scheduler can trigger the pipeline at 6 AM daily via a cron job, while Cloud Functions, triggered by Cloud Storage events (e.g., object finalize), can call the Vertex AI API to start the pipeline when new data arrives. This combination provides both schedule-based and event-driven triggers without requiring custom infrastructure.

Exam trap

The trap here is that candidates often assume Vertex AI Pipelines has a built-in scheduler or that Pub/Sub can directly trigger pipelines, but in reality, Cloud Functions (or Cloud Run) is needed as an intermediary to translate events into API calls.

How to eliminate wrong answers

Option A is wrong because Cloud Pub/Sub with a Push subscription cannot directly trigger Vertex AI Pipelines; Vertex AI does not accept Pub/Sub push messages as a trigger source. Option B is wrong because Cloud Functions does not support a native cron trigger; it requires Cloud Scheduler to invoke it on a schedule, making it unsuitable for the schedule requirement alone. Option D is wrong because Vertex AI Pipelines does not have a built-in scheduler; it relies on external services like Cloud Scheduler for time-based triggers.

Practice this question →

28

MCQhard

A machine learning team wants to implement a continuous delivery pipeline for their ML models using Vertex AI Pipelines. The pipeline should automatically deploy a model to a staging endpoint after evaluation passes, and then after manual approval, promote it to production. Which strategy should they use to manage model versions in the Vertex AI Model Registry?

A.Use Vertex AI Model Registry aliases: assign the model version an alias 'staging' initially, and after approval, change the alias to 'production' via the API.

B.Store model artifacts in Cloud Storage with versioned paths and deploy directly from Storage without using the Model Registry.

C.Create two separate models in the registry: one for staging and one for production, and copy artifacts between them after approval.

D.Upload each model version with the same model ID and use labels to differentiate staging vs production.

AnswerA

Aliases like 'staging' and 'production' allow controlled promotion of model versions.

Why this answer

Vertex AI Model Registry supports versioning and alias management. By uploading models with an alias like 'staging' and later changing the alias to 'production', the team can control promotion. The other options either lack the required functionality or add unnecessary complexity.

Practice this question →

29

MCQeasy

What is the purpose of the 'importer' component in Vertex AI Pipelines?

A.To import data from external sources into the pipeline.

B.To import existing ML artifacts (e.g., models, datasets) into a pipeline as inputs.

C.To import Python libraries into the pipeline environment.

D.To import pipeline definitions from other projects.

AnswerB

This is the correct use of the importer component.

Why this answer

The importer component allows you to bring existing artifacts (e.g., a model in Vertex AI Model Registry) into a pipeline for downstream use.

Practice this question →

30

MCQhard

In a Vertex AI Pipeline, a component produces a Metrics artifact that includes an evaluation metric. The engineer wants to use this metric value as a condition to decide whether to deploy the model. However, the metric value is stored in the artifact's metadata and not directly as a pipeline parameter. How can the engineer pass the metric value to a downstream conditional task?

A.Configure the component that produces the Metrics artifact to also output the metric as a pipeline parameter.

B.Use the importer component to convert the artifact into a parameter.

C.Add a component that reads the artifact's metadata and outputs the metric as a parameter, then use that parameter in the condition.

D.Use the artifact directly in the dsl.If condition, as artifacts are comparable.

AnswerC

A small Python function component can extract the metric value from the artifact's metadata and output it as a string or float parameter, which can then be used in dsl.If.

Why this answer

Option C is correct because Vertex AI Pipeline conditions require pipeline parameters (typed values) to evaluate expressions like `dsl.If`. A Metrics artifact's metadata is stored as an artifact property, not a pipeline parameter, so a custom component must read that metadata and output the metric as a parameter. This parameter can then be used in the `dsl.If` condition to control downstream deployment.

Exam trap

A common trap in this Google exam is the misconception that artifact metadata can be directly used in pipeline conditions, but conditions require typed parameters, not artifact objects or their metadata fields.

How to eliminate wrong answers

Option A is wrong because modifying the upstream component to output the metric as a pipeline parameter would require changing the component's implementation, which may not be feasible if the component is from a shared library or third-party. Option B is wrong because the importer component is designed to bring external artifacts into the pipeline, not to extract metadata from an existing artifact and convert it into a parameter. Option D is wrong because artifacts are not directly comparable in `dsl.If` conditions; conditions only work with pipeline parameters (e.g., integers, strings), not with artifact objects or their metadata.

Practice this question →

31

MCQeasy

Which of the following is a best practice when designing idempotent pipeline components in Vertex AI?

A.Use global variables to share state between components.

B.Pass data through Cloud Storage URIs rather than in-memory.

C.Write component outputs to a database with timestamps.

D.Use the same output name for all runs to avoid duplication.

AnswerB

GCS URIs make components idempotent and enable caching.

Why this answer

Passing data through Cloud Storage URIs ensures that component outputs are stored persistently and can be retrieved by downstream components, even if the original component instance is terminated or scaled down. This aligns with the principle of idempotency because the same input will always produce the same output stored at the same URI, and re-running the component will not cause side effects or data loss. In contrast, in-memory data is ephemeral and tied to a specific runtime instance, breaking idempotency across retries or parallel executions.

Exam trap

The Google PMLE exam often tests the misconception that idempotency is about avoiding duplication of output names or using timestamps for uniqueness, when in fact idempotency requires that repeated executions produce the same result without side effects, which is achieved by using immutable, deterministic storage like Cloud Storage URIs rather than mutable state or time-dependent writes.

How to eliminate wrong answers

Option A is wrong because using global variables to share state between components introduces mutable shared state that can cause non-deterministic behavior across retries or parallel runs, violating idempotency. Option C is wrong because writing component outputs to a database with timestamps introduces a side effect that changes with each run (different timestamps), making the component non-idempotent; idempotent components should produce the same output regardless of how many times they are executed. Option D is wrong because using the same output name for all runs does not guarantee idempotency; it can lead to overwriting or collision of outputs, and idempotency requires that repeated executions produce the same result without unintended side effects, not just the same output name.

Practice this question →

32

MCQmedium

A company is using Vertex AI Pipelines to automate model retraining. They have a component that creates a BigQuery table with training data. To ensure idempotency, the component should check if the table already exists and recreate it if necessary. What is the best practice for passing data between pipeline components?

A.Pass data in-memory as Python objects between components.

B.Use BigQuery table names as component outputs and inputs.

C.Use Cloud SQL to store intermediate results and pass connection strings.

D.Store data as artifacts in Cloud Storage and pass the GCS URI between components.

AnswerD

Correct: Passing GCS URIs allows components to be idempotent and data to be versioned.

Why this answer

Option D is correct because Vertex AI Pipelines is designed to pass data between components via Cloud Storage artifacts. By storing the BigQuery table metadata or training data as a file in Cloud Storage and passing the GCS URI as an artifact, the pipeline ensures idempotency and decouples components. This approach aligns with Kubeflow Pipelines' artifact-based I/O model, where each component's outputs are materialized as URIs rather than in-memory objects.

Exam trap

The trap here is that candidates confuse 'passing data' with 'passing references to external services' (like BigQuery table names or Cloud SQL connection strings), but Vertex AI Pipelines expects artifact URIs (typically GCS paths) to maintain pipeline lineage, caching, and reproducibility.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines components run in isolated containers; passing Python objects in-memory is not supported across distributed steps and would break pipeline reproducibility. Option B is wrong because BigQuery table names are not first-class pipeline artifacts; passing them directly couples components to a specific table state and does not leverage Vertex AI's artifact tracking or lineage. Option C is wrong because Cloud SQL introduces unnecessary latency and complexity for intermediate data; Vertex AI Pipelines natively uses Cloud Storage for artifact passing, and connection strings are not a standard pipeline I/O type.

Practice this question →

33

MCQeasy

A data scientist is defining a Vertex AI pipeline and needs to include a step that imports a pre-existing model from Cloud Storage into the pipeline as an artifact. Which Kubeflow Pipelines SDK v2 component should they use?

A.dsl.Collected

B.dsl.importer

C.dsl.Importer

D.dsl.Artifact

AnswerB

dsl.importer is used to import existing artifacts into a pipeline.

Why this answer

The `dsl.importer` component in Kubeflow Pipelines SDK v2 is specifically designed to import existing artifacts (such as models, datasets, or metrics) from external storage (e.g., Cloud Storage) into a pipeline as a pipeline artifact. It allows you to reference a pre-existing model without retraining or re-uploading, making it the correct choice for this use case.

Exam trap

The trap here is that candidates may confuse the Python class naming convention (capitalized `Importer`) with the actual SDK v2 function name (lowercase `importer`), or mistakenly think `dsl.Artifact` can import artifacts when it only defines the artifact schema.

How to eliminate wrong answers

Option A is wrong because `dsl.Collected` is not a valid Kubeflow Pipelines SDK v2 component; it does not exist in the API. Option C is wrong because `dsl.Importer` (capital 'I') is not a valid class or function in the SDK v2; the correct name is all lowercase `dsl.importer`. Option D is wrong because `dsl.Artifact` is a base class for defining custom artifact types, not a component for importing artifacts into a pipeline.

Practice this question →

34

Multi-Selectmedium

A data science team uses Cloud Composer to orchestrate a complex ML workflow. They need to run a Vertex AI pipeline and then a BigQuery query conditionally based on the pipeline's output. Which Airflow features should they use? (Choose two.)

Select 2 answers

A.BigQueryExecuteQueryOperator

B.Task dependencies using >>

C.dsl.If

D.VertexAIPipelineJobOperator

E.PythonOperator with if-else

AnswersB, D

The >> operator defines task order and can be used with branching operators.

Why this answer

In Airflow, tasks are defined using operators, and dependencies between tasks are set using >> or set_downstream. Conditional branching can be implemented using BranchPythonOperator or short-circuiting, but the basic dependency is set with >>.

Practice this question →

35

MCQmedium

A machine learning engineer has a Vertex AI pipeline that trains a model. The pipeline uses caching to avoid re-running components that have not changed. After updating the training code, the engineer notices that the pipeline still uses cached outputs from the previous run. What could be the reason?

A.The pipeline parameter values have changed, causing a cache hit.

B.The pipeline is using a pre-built component that ignores caching.

C.The base image used for the component has not changed, so the cache key matches despite code changes inside the container.

D.The component has caching disabled via the @dsl.component decorator.

AnswerC

The cache key includes the base image digest; if only the code inside the container changes but the image tag/digest remains the same, the cache key may still match.

Why this answer

In Vertex AI Pipelines, caching uses a cache key derived from the component source code, input parameters, and the base image digest. If only the training code inside the container changes but the base image digest remains the same, the cache key does not change, resulting in a cache hit from the previous run. This explains why the pipeline still uses cached outputs.

Option A is incorrect because changing parameter values would change the cache key, causing a miss. Option B is incorrect because pre-built components support caching unless explicitly disabled. Option D would disable caching entirely, which is not the case here.

Practice this question →

36

Multi-Selectmedium

An ML engineer is building a continuous training pipeline that retrains a model when new data arrives. The pipeline should also detect skew between training and serving data. Which TWO Google Cloud services should they use? (Choose two.)

Select 2 answers

A.Cloud Logging

B.Vertex AI Model Monitoring

C.Cloud Functions

D.Vertex AI Pipelines

E.Cloud Monitoring

AnswersB, D

For skew detection.

Why this answer

Vertex AI Model Monitoring (B) is correct because it is purpose-built to detect skew between training and serving data by continuously comparing feature distributions and alerting on statistically significant drift. Vertex AI Pipelines (D) is correct because it provides a serverless, scalable orchestration service for building continuous training pipelines that automatically retrain models when new data arrives, integrating with Cloud Build and other services.

Exam trap

The Google PMLE exam often tests the distinction between monitoring for infrastructure health (Cloud Monitoring) versus monitoring for ML-specific data skew (Vertex AI Model Monitoring), leading candidates to confuse general observability with ML-specific drift detection.

Practice this question →

37

MCQmedium

A data scientist creates a custom Python function component for a Vertex AI pipeline using the Kubeflow Pipelines SDK v2. The component takes a string parameter 'input_text' and outputs a Metrics artifact. The scientist wants to include a lightweight Python function without building a container. Which code snippet correctly defines this component?

A.@dsl.component\ndef my_component(input_text: str) -> Metric:\n metrics = Metric()\n metrics.log_metric('length', len(input_text))

B.@dsl.pipeline\ndef my_pipeline(input_text: str):\n metrics = Metrics()\n metrics.log_metric('length', len(input_text))

C.def my_component(input_text: str) -> Metrics:\n from kfp.dsl import Metrics\n metrics = Metrics()\n metrics.log_metric('length', len(input_text))\n return metrics

D.@dsl.component(base_image='python:3.9')\ndef my_component(input_text: str) -> Metrics:\n from kfp.dsl import Metrics\n metrics = Metrics()\n metrics.log_metric('length', len(input_text))\n return metrics

AnswerD

Correct: Uses @dsl.component with base_image, imports Metrics inside the function, and returns a Metrics artifact.

Why this answer

Option D is correct because it uses the `@dsl.component` decorator with a `base_image` parameter, which is required for lightweight Python function components in Kubeflow Pipelines SDK v2. The decorator enables the component to run without a custom container by specifying a base image (here, `python:3.9`), and the function correctly returns a `Metrics` artifact after logging a metric. Without the decorator or with an incorrect decorator, the component would not be recognized as a pipeline component.

Exam trap

Google often tests the requirement for the `base_image` parameter in `@dsl.component` for lightweight Python functions in Vertex AI pipelines, and candidates mistakenly assume the decorator alone is sufficient without specifying the base image.

How to eliminate wrong answers

Option A is wrong because it uses `@dsl.component` without a `base_image`, which is required for lightweight Python function components (otherwise it defaults to a container-based component, causing a runtime error). Option B is wrong because `@dsl.pipeline` is used to define a pipeline, not a component, and `Metrics()` is not a valid class (the correct class is `Metrics` from `kfp.dsl`). Option C is wrong because it lacks the `@dsl.component` decorator entirely, so the function is not registered as a pipeline component and cannot be used in a pipeline.

Practice this question →

38

MCQmedium

A data science team uses Cloud Composer to orchestrate ML workflows. They need to trigger a Vertex AI pipeline after a BigQuery data load completes, and then run a Dataflow job. Which Airflow operator should they use to launch the Vertex AI pipeline?

A.DataflowStartFlexTemplateJobOperator

B.VertexAICreateCustomJobOperator

C.MLEngineTrainingOperator

D.VertexAIPipelineJobOperator

AnswerD

This operator triggers a Vertex AI Pipeline job within an Airflow DAG.

Why this answer

Option D is correct because the VertexAIPipelineJobOperator is specifically designed to trigger a Vertex AI pipeline run from within an Airflow DAG. This operator directly corresponds to the requirement of launching a Vertex AI pipeline after a BigQuery data load completes, making it the appropriate choice for orchestrating ML workflows with Cloud Composer.

Exam trap

Google often tests the distinction between operators for different Vertex AI services (e.g., custom jobs vs. pipelines), so candidates may confuse VertexAICreateCustomJobOperator with VertexAIPipelineJobOperator when the question specifically asks for launching a pipeline.

How to eliminate wrong answers

Option A is wrong because DataflowStartFlexTemplateJobOperator is used to start a Dataflow job using a Flex Template, not to launch a Vertex AI pipeline. Option B is wrong because VertexAICreateCustomJobOperator is used to create a custom training job in Vertex AI, not to run a Vertex AI pipeline. Option C is wrong because MLEngineTrainingOperator is a legacy operator for AI Platform (now Vertex AI) training jobs, not for triggering Vertex AI pipelines.

Practice this question →

39

Multi-Selectmedium

You need to orchestrate a complex ML workflow that involves multiple Vertex AI pipelines, BigQuery jobs, and Dataflow pipelines. The workflow must handle dependencies, retries, and monitoring. Which two services are best suited for this orchestration?

Select 2 answers

A.Cloud Composer

B.Vertex AI Pipelines

C.Cloud Scheduler

D.BigQuery scheduled queries

E.Cloud Functions

AnswersA, B

Airflow can orchestrate Vertex AI pipelines, BigQuery jobs, and Dataflow pipelines with dependencies.

Why this answer

Cloud Composer (based on Apache Airflow) is the correct choice because it provides a managed environment for orchestrating complex workflows with dependencies, retries, and monitoring across heterogeneous services like Vertex AI pipelines, BigQuery, and Dataflow. Airflow's DAGs allow you to define task dependencies, set retry policies, and integrate with Cloud Monitoring for observability, making it ideal for multi-service ML workflows.

Exam trap

The trap here is that candidates often confuse Cloud Scheduler or Cloud Functions as sufficient for orchestration, but they lack the dependency management, retry logic, and cross-service monitoring that Cloud Composer provides for complex ML workflows.

Practice this question →

40

MCQeasy

A machine learning engineer needs to pass a large dataset between two components in a Vertex AI pipeline. What is the recommended way to pass this data?

A.Store the dataset as a Dataset artifact and pass the artifact between components.

B.Write the dataset to a temporary BigQuery table and pass the table name.

C.Serialize the dataset to a string and pass it as a pipeline parameter.

D.Use a Cloud Storage bucket and pass the bucket name as a parameter.

AnswerA

Correct: Using Dataset artifacts ensures efficient storage and versioning via Cloud Storage.

Why this answer

In Vertex AI Pipelines, the recommended way to pass large datasets between components is to use a `Dataset` artifact. Artifacts are metadata references that point to the underlying data stored in Cloud Storage, enabling efficient, scalable, and type-safe data passing without serialization overhead or size limits. This approach leverages the Kubeflow Pipelines SDK's artifact tracking, which automatically handles lineage and versioning.

Exam trap

The trap here is that candidates often assume passing a Cloud Storage bucket name (Option D) is sufficient, but they miss that artifacts provide automatic metadata tracking, type safety, and integration with Vertex AI's lineage system, which is required for production ML pipelines.

How to eliminate wrong answers

Option B is wrong because writing a large dataset to a temporary BigQuery table introduces unnecessary latency, cost, and complexity; BigQuery is designed for analytical queries, not as an intermediate data transfer mechanism for pipeline components. Option C is wrong because serializing a large dataset to a string and passing it as a pipeline parameter violates the parameter size limit (typically 64KB in Kubeflow Pipelines) and would cause out-of-memory errors or pipeline failures. Option D is wrong because passing only the bucket name as a parameter lacks the structured metadata and type safety that artifacts provide; it forces components to independently resolve file paths and does not automatically track lineage or versioning.

Practice this question →

41

MCQeasy

An organization wants to use Cloud Composer (Airflow) to orchestrate a machine learning workflow that includes running a Vertex AI Pipeline, followed by a BigQuery job, and then a Dataflow pipeline. What is the primary advantage of using Cloud Composer for this orchestration?

A.It allows orchestrating heterogeneous workflows across multiple GCP services with dependencies and retries.

B.It automatically caches the outputs of each step to avoid recomputation.

C.It integrates natively with the Vertex AI Model Registry for model versioning.

D.It provides a serverless execution environment for ML pipelines.

AnswerA

Cloud Composer excels at orchestrating complex DAGs that span multiple services like Vertex AI, BigQuery, and Dataflow.

Why this answer

Cloud Composer (Apache Airflow) is designed to orchestrate heterogeneous workflows across multiple GCP services. In this scenario, it can define a Directed Acyclic Graph (DAG) that runs a Vertex AI Pipeline, then a BigQuery job, and finally a Dataflow pipeline, with built-in support for dependency management, retries, and failure handling. This is the primary advantage because it allows you to coordinate disparate services in a single, reliable workflow.

Exam trap

The trap here is that candidates may confuse Cloud Composer's orchestration capabilities with features specific to individual GCP services (like caching, model registry, or serverless execution), leading them to pick options that describe those services' features rather than the primary advantage of using an orchestrator.

How to eliminate wrong answers

Option B is wrong because Cloud Composer does not automatically cache outputs of each step; caching is a feature of specific services like Vertex AI Pipelines or Dataflow, not Airflow itself. Option C is wrong because Cloud Composer does not natively integrate with the Vertex AI Model Registry; that integration is handled by Vertex AI Pipelines or custom operators, not by Airflow's core orchestration. Option D is wrong because Cloud Composer is not serverless; it runs on a managed GKE cluster, and serverless ML pipeline execution is provided by Vertex AI Pipelines, not Cloud Composer.

Practice this question →

42

Multi-Selecteasy

A company uses Vertex AI Pipelines for ML training. They want to implement continuous training triggered by new data arrival. Which two Google Cloud services should they use to achieve this? (Choose two.)

Select 2 answers

A.Cloud Functions

B.Cloud Scheduler

C.Cloud Storage

D.Vertex AI Experiments

E.Cloud Composer

AnswersA, C

Cloud Functions can be triggered by GCS events and start a Vertex AI pipeline.

Why this answer

Cloud Functions is correct because it can be triggered directly by Cloud Storage events (e.g., object finalize/create) to start a Vertex AI Pipeline run when new data arrives, enabling event-driven continuous training without manual intervention. Cloud Storage is correct because it serves as the source of new data and its event notifications (via Pub/Sub) are the trigger mechanism that Cloud Functions subscribes to, forming the core event-driven architecture.

Exam trap

Google often tests the distinction between event-driven triggers (Cloud Functions + Cloud Storage) and time-based schedulers (Cloud Scheduler), leading candidates to incorrectly select Cloud Scheduler when the requirement is 'triggered by new data arrival' rather than 'run at a specific time'.

Practice this question →

43

MCQeasy

A data engineer wants to orchestrate a complex workflow that includes running a Vertex AI pipeline, then a BigQuery job, and finally a Dataflow pipeline. The workflow must handle dependencies, retries, and monitoring. Which Google Cloud service is most suitable for this orchestration?

A.Cloud Tasks

B.Cloud Composer

C.Cloud Scheduler

D.Workflows

AnswerB

Correct: Cloud Composer (Airflow) provides DAG-based orchestration with operators for all mentioned services.

Why this answer

Cloud Composer (based on Apache Airflow) is the most suitable service for orchestrating a complex workflow with dependencies, retries, and monitoring across Vertex AI, BigQuery, and Dataflow. It provides a managed Airflow environment that natively supports DAG-based orchestration, built-in retry logic, and integration with Google Cloud services via operators like VertexAIPipelineOperator, BigQueryOperator, and DataflowTemplatedJobStartOperator.

Exam trap

A common misconception is that Workflows is sufficient for complex ML orchestration, but it lacks the built-in operator integrations and retry semantics that Cloud Composer provides for multi-service pipelines.

How to eliminate wrong answers

Option A is wrong because Cloud Tasks is a distributed task queue for executing discrete, short-lived tasks with HTTP endpoints, not for orchestrating multi-step workflows with complex dependencies and retries across different services. Option C is wrong because Cloud Scheduler is a cron-based job scheduler that triggers single events at specified times, lacking the ability to manage dependencies between multiple pipeline stages or handle retries. Option D is wrong because Workflows is a low-code orchestration service for sequential or parallel steps, but it does not natively support the rich operator ecosystem, retry policies, or monitoring capabilities that Cloud Composer provides for ML pipelines involving Vertex AI, BigQuery, and Dataflow.

Practice this question →

44

MCQmedium

A team uses Cloud Build to automatically trigger a Vertex AI pipeline when changes are pushed to the model code repository. They have a cloudbuild.yaml file that builds a container image and submits the pipeline. However, they want to run the pipeline only if the commit includes changes to the 'training/' directory. Which Cloud Build configuration option should be used to filter the trigger?

A.Add a 'ignoreFiles' field with 'training/**' to the trigger.

B.Use a 'substitutions' field with a regex pattern to filter commits.

C.Configure a Cloud Function to check the commit diff and call Cloud Build API conditionally.

D.Set the 'includedFiles' field to 'training/**' in the trigger configuration.

AnswerD

Correct: includedFiles filters to only trigger when files under training/ are changed.

Why this answer

Option D is correct because Cloud Build triggers support an `includedFiles` field that specifies a glob pattern. When set to `training/**`, the trigger will only fire if the commit includes changes to files under the `training/` directory. This is the native, declarative way to filter triggers based on changed file paths without additional infrastructure.

Exam trap

The trap here is that candidates confuse `ignoreFiles` with `includedFiles`, or assume that a custom solution like Cloud Functions is required when Cloud Build already provides a native, simpler mechanism for path-based filtering.

How to eliminate wrong answers

Option A is wrong because `ignoreFiles` excludes commits that match the pattern, but the requirement is to run the pipeline only when changes occur in `training/`, not to ignore them. Option B is wrong because `substitutions` are used for variable replacement in build configuration, not for filtering trigger conditions based on file changes. Option C is wrong because while a Cloud Function could achieve this, it introduces unnecessary complexity and cost; Cloud Build triggers natively support file path filtering via `includedFiles`, making a separate function an anti-pattern.

Practice this question →

45

MCQmedium

A company wants to implement continuous delivery (CD) for ML models, where a model is automatically deployed to a staging environment and only promoted to production after passing an evaluation gate. Which combination of GCP services is BEST suited for orchestrating this CD pipeline?

A.Cloud Scheduler and Pub/Sub

B.Cloud Composer (Airflow) with Cloud Functions

C.Cloud Build with Vertex AI Pipelines and Cloud Deploy

D.Vertex AI Pipelines with Cloud Run

AnswerC

Cloud Build triggers the pipeline, Vertex AI Pipelines runs training/evaluation, and Cloud Deploy manages promotion to production.

Why this answer

Cloud Build can trigger on code/model changes and run a pipeline that deploys to staging. After evaluation, if successful, it can promote to production using Cloud Deploy or directly update Vertex AI endpoints. Cloud Composer (Airflow) is also a good option for complex orchestration, but for CI/CD, Cloud Build is a natural fit.

The combination of Cloud Build, Cloud Deploy, and Vertex AI provides a robust CD pipeline.

Practice this question →

46

MCQhard

A team runs a Vertex AI pipeline daily. They notice that a component that downloads a file from a public URL always executes even when the URL and parameters haven't changed. They want to avoid unnecessary re-execution and reduce costs. What should they do?

A.Ensure the component is deterministic and that caching is not disabled for that component.

B.Enable pipeline caching by setting 'enable_caching=True' on the pipeline decorator.

C.Use a pre-built Google Cloud Pipeline Component for file download.

D.Use the 'dsl.CachingOptions' to set a custom cache key for the component.

AnswerA

Caching may be disabled per component, or the component may produce non-deterministic outputs (e.g., timestamp). Enabling caching and making the component deterministic will allow reuse.

Why this answer

Vertex AI Pipelines caching relies on a cache key based on component inputs, source code, and base image digest. If a component always re-executes, it may be because caching is explicitly disabled for that component (via 'disable_caching=True') or the component is non-deterministic (e.g., generating random values or including timestamps). Option A addresses both: ensure the component is deterministic and verify caching is not disabled.

Option B is wrong because 'enable_caching=True' on the pipeline decorator is not the correct API; caching is enabled by default and controlled per task via 'dsl.CachingOptions' or the 'disable_caching' parameter. Option C is irrelevant—using a pre-built component does not bypass caching issues if caching is disabled or the component is non-deterministic. Option D is unnecessary; custom cache keys are not required if the default caching works; the problem is either caching disabled or non-determinism.

Practice this question →

47

MCQmedium

A data scientist wants to create a Vertex AI pipeline component that uses a custom container image stored in Artifact Registry. The component should accept a dataset artifact as input and output a model artifact. Which component type should they use?

A.A Lightweight Python component without base_image.

B.A Python function component with a custom base_image.

C.A container component defined with the ContainerSpec class.

D.A pre-built component from GCPC.

AnswerC

Correct: Container components use ContainerSpec to specify the image, command, and arguments.

Why this answer

Option C is correct because a container component defined with the `ContainerSpec` class is the only Vertex AI component type that allows you to specify a custom container image from Artifact Registry. This component type directly wraps a Docker container, enabling you to define inputs (e.g., a dataset artifact) and outputs (e.g., a model artifact) via the `ContainerSpec` interface, which maps to the container's command-line arguments and environment variables. Lightweight Python components and Python function components cannot use a custom container image without a base image, and pre-built components from GCPC are fixed and do not support custom containers.

Exam trap

The trap here is that candidates confuse a Python function component with a custom `base_image` (Option B) as equivalent to running a custom container, but the `base_image` is used to build a new container from a Python function, not to directly execute an existing container image from Artifact Registry.

How to eliminate wrong answers

Option A is wrong because a Lightweight Python component without `base_image` uses the default Vertex AI Python image and cannot reference a custom container image from Artifact Registry; it is designed for inline Python code, not custom containers. Option B is wrong because a Python function component with a custom `base_image` still runs as a Python function within a container built from that base image, but it does not directly wrap an arbitrary custom container image from Artifact Registry; the base image is used to build a new container, not to run an existing one. Option D is wrong because a pre-built component from GCPC (Google Cloud Pipeline Components) is a fixed, reusable component that cannot be customized with a user-provided container image; it is intended for common operations like AI Platform training or data processing, not for running a custom container.

Practice this question →

48

MCQmedium

An organization wants to trigger a Vertex AI pipeline whenever new data arrives in a Cloud Storage bucket. Which approach should they use?

A.Configure a Vertex AI pipeline trigger directly on the bucket using the GCP Console.

B.Use Pub/Sub notifications from the bucket and a Dataflow job to start the pipeline.

C.Set up a Cloud Scheduler job that runs every minute and checks for new files in the bucket.

D.Use Cloud Functions triggered by Cloud Storage events to call the Vertex AI pipeline API.

AnswerD

This is the recommended event-driven architecture: Storage event → Cloud Function → pipeline.

Why this answer

Option D is correct because Cloud Functions can be directly triggered by Cloud Storage events (e.g., `google.storage.object.finalize`) and can then call the Vertex AI pipeline API using the Cloud SDK or client libraries. This provides a serverless, event-driven architecture that reacts immediately to new data without polling or additional infrastructure.

Exam trap

The trap here is that candidates may assume Vertex AI pipelines have a built-in Cloud Storage trigger (Option A) or over-engineer the solution with Dataflow (Option B), when the simplest and most native serverless approach is Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Vertex AI pipelines do not support configuring a trigger directly on a Cloud Storage bucket via the GCP Console; there is no native bucket-to-pipeline trigger. Option B is wrong because while Pub/Sub notifications from the bucket are possible, adding a Dataflow job introduces unnecessary complexity and latency—Dataflow is a batch/stream processing engine, not a lightweight event router. Option C is wrong because a Cloud Scheduler job that runs every minute and checks for new files is inefficient (polling), introduces up to 60 seconds of latency, and does not scale well; it also requires custom code to track file states.

Practice this question →

49

Multi-Selectmedium

An ML pipeline must run a set of preprocessing tasks for each data shard in parallel. Which KFP SDK features should they use to implement this? (Choose two.)

Select 2 answers

A.dsl.ParallelFor

B.dsl.PipelineParam

C.dsl.Collected

D.dsl.Condition

E.dsl.ExitHandler

AnswersA, C

This creates a parallel loop over a list parameter.

Why this answer

Option A (dsl.ParallelFor) is correct because it allows you to iterate over a list of items (e.g., data shard identifiers) and execute a set of tasks for each item in parallel within a KFP pipeline. Option C (dsl.Collected) is correct because it collects the outputs from each parallel iteration of a dsl.ParallelFor loop into a single list, enabling downstream tasks to consume all shard results as a single artifact or parameter.

Exam trap

Cisco often tests the distinction between parallel iteration (dsl.ParallelFor + dsl.Collected) and sequential iteration or conditional logic, so candidates mistakenly pick dsl.Condition or dsl.PipelineParam when they see 'for each shard' and think of parameters or branching.

Practice this question →

50

MCQmedium

An ML engineer is designing a pipeline that should run only when new training data arrives in a Cloud Storage bucket. Which event-driven approach should they use to trigger the Vertex AI Pipeline?

A.Use Cloud Storage Pub/Sub notifications to send events to a Cloud Function that triggers the pipeline.

B.Use Cloud Tasks to queue a pipeline run whenever a new file is uploaded.

C.Configure the pipeline to run on a schedule and check for new data inside the pipeline.

D.Set up a Cloud Scheduler job that runs every minute to check for new files.

AnswerA

This is the recommended event-driven pattern: Cloud Storage event → Pub/Sub → Cloud Function → Vertex AI API.

Why this answer

The best approach is to use Cloud Storage notifications via Pub/Sub, then a Cloud Function that receives the event and calls the Vertex AI API to create a pipeline job. This is a common event-driven pattern. Cloud Scheduler is for scheduled triggers, not event-driven.

Cloud Tasks and Cloud Run are not typically used for this purpose.

Practice this question →

51

MCQmedium

An ML engineer wants to containerize a custom training script and use it as a component in a Vertex AI Pipeline. The component should accept a dataset URI and a learning rate parameter, and output a trained model artifact. Which approach should the engineer use to define the component?

A.Use a pre-built Google Cloud Pipeline Component for Vertex AI Training with custom container configuration.

B.Use ContainerComponent from kfp.v2.components to define the container, its inputs, and outputs.

C.Define a Python function component with @dsl.component and include the container code inline.

D.Use the importer component to import the script and then run it as a task.

AnswerB

ContainerComponent allows defining a custom container component with explicit inputs and outputs.

Why this answer

Option B is correct because ContainerComponent from kfp.v2.components allows you to define a custom container component by specifying the container image, command, inputs, and outputs directly. This is the appropriate approach when you have a custom training script that you want to containerize and use as a component in a Vertex AI Pipeline, as it gives you full control over the container configuration and artifact handling.

Exam trap

Candidates often confuse Python function components (@dsl.component) with container components. The trap here is that they may think a Python function component can containerize a custom script, but it cannot directly specify a container image and artifact outputs like ContainerComponent does.

How to eliminate wrong answers

Option A is wrong because pre-built Google Cloud Pipeline Components for Vertex AI Training are designed for standard training jobs with built-in algorithms or custom containers, but they do not allow you to define custom inputs and outputs as artifacts in the same declarative way as ContainerComponent; they are more rigid and less suited for a fully custom component with a dataset URI and learning rate parameter. Option C is wrong because @dsl.component is used for Python function components that run Python code directly, not for containerized components; including container code inline would mix the container definition with Python function logic, which is not the intended use and would not properly handle container image specification and artifact outputs. Option D is wrong because the importer component is used to import existing artifacts (like models or datasets) into the pipeline, not to run a training script; it cannot execute a custom training script or produce a trained model artifact from scratch.

Practice this question →

52

MCQeasy

A machine learning engineer wants to use a pre-built Google Cloud Pipeline Components (GCPC) to train a model using Vertex AI. Which component should they use?

A.AutoMLTabularTrainingJobRunOp

B.VertexTrainJobOp

C.VertexEndpointCreateOp

D.VertexBatchPredictOp

AnswerB

This is the pre-built component for running a custom training job on Vertex AI.

Why this answer

The Google Cloud Pipeline Components library includes pre-built components for various Vertex AI services. For training, the correct component is VertexTrainJobOp. VertexBatchPredictOp is for batch prediction, VertexEndpointCreateOp for deploying endpoints, and AutoMLTabularTrainingJobRunOp is specifically for AutoML tabular jobs.

Practice this question →

53

MCQmedium

A team wants to implement continuous training for their ML model. The pipeline should be triggered when new training data arrives in a Cloud Storage bucket. Which combination of services should they use?

A.Cloud Storage → BigQuery → Vertex AI Pipelines

B.Cloud Storage → Cloud Functions → Vertex AI Pipelines

C.Cloud Storage → Cloud Build → Vertex AI Pipelines

D.Cloud Storage → Cloud Scheduler → Vertex AI Pipelines

AnswerB

Cloud Storage can send notifications to Cloud Functions (via Pub/Sub) which then starts the pipeline.

Why this answer

Event-driven triggers for Vertex AI pipelines are typically implemented using Cloud Storage notifications to Pub/Sub, which triggers a Cloud Function that creates a pipeline job. Cloud Build is for CI/CD, not for event-driven data triggers.

Practice this question →

54

MCQmedium

You are developing a Vertex AI pipeline that runs multiple parallel training jobs with different hyperparameters, then collects their results and selects the best model. Which KFP SDK v2 construct should you use to run the parallel training tasks?

A.dsl.Metrics

B.dsl.If

C.dsl.ParallelFor

D.dsl.Collected

AnswerD

dsl.Collected creates a parallel for-loop, executing tasks in parallel for each item in a list.

Why this answer

In KFP SDK v2, `dsl.Collected` is used to run parallel training tasks with different hyperparameters by gathering their outputs into a single list, which can then be passed to a downstream component for aggregation or selection. This construct enables the pipeline to execute multiple parallel branches and collect their results for model selection.

Exam trap

In Vertex AI pipelines using KFP SDK v2, candidates often confuse `dsl.ParallelFor` (which controls parallel execution) with `dsl.Collected` (which gathers outputs). The question asks about collecting results for selection, not just running in parallel, so `dsl.Collected` is the correct choice.

How to eliminate wrong answers

Option A is wrong because `dsl.Metrics` is a KFP SDK v2 class for defining and reporting evaluation metrics (e.g., accuracy, loss) from a component, not for collecting outputs from parallel tasks. Option B is wrong because `dsl.If` is a conditional construct used to execute tasks based on a condition, not for gathering results from parallel executions. Option C is wrong because `dsl.ParallelFor` is used to iterate over a list and execute tasks in parallel, but it does not provide a mechanism to collect the outputs of those parallel tasks into a single downstream input; `dsl.Collected` is specifically designed for that purpose.

Practice this question →

55

MCQhard

A company has a CI/CD pipeline that retrains a model every time new training data is available. They want to automatically deploy the new model to production only if it passes a set of evaluation tests on a staging environment. Which approach best implements this?

A.Implement a two-stage pipeline: train and deploy to staging, run evaluation tests, and if passed, deploy to production using conditional logic.

B.Use Cloud Build to trigger a training job and then a separate deployment job without evaluation.

C.Use a single Vertex AI pipeline that trains and deploys to staging, then manually promote.

D.Train and deploy directly to production in one pipeline.

AnswerA

This implements an automated evaluation gate.

Why this answer

A continuous delivery pipeline with evaluation gates uses separate stages: staging deployment → evaluation → promotion to production.

Practice this question →

56

MCQhard

A machine learning engineer is building a Vertex AI pipeline that uses a pre-built AutoML Tables component to train a classification model. The pipeline also includes a conditional step that deploys the model to an endpoint only if the evaluation metrics exceed a threshold. Which KFP feature should be used to implement the conditional deployment?

A.dsl.ParallelFor

B.dsl.ExitHandler

C.dsl.Condition

D.dsl.Collected

AnswerC

Correct: dsl.Condition (or dsl.If) provides conditional execution of tasks based on conditions.

Why this answer

The `dsl.Condition` feature from KFP (Kubeflow Pipelines) is specifically designed to conditionally execute pipeline steps based on the output of a previous component. In this scenario, the AutoML Tables component produces evaluation metrics; `dsl.Condition` allows the pipeline to check whether those metrics exceed a threshold and, if true, run the deployment step. This is the correct, native KFP construct for implementing branching logic within a pipeline.

Exam trap

The trap here is that candidates often confuse `dsl.Condition` with `dsl.ExitHandler` because both involve decision-making, but `ExitHandler` is only for post-exit cleanup, not for branching based on step outputs.

How to eliminate wrong answers

Option A is wrong because `dsl.ParallelFor` is used for iterating over a collection of items and executing steps in parallel, not for conditional branching based on a single metric threshold. Option B is wrong because `dsl.ExitHandler` is a mechanism to run a cleanup or notification step when a pipeline exits (successfully or with failure), not for conditionally deploying a model based on evaluation results. Option D is wrong because `dsl.Collected` is a function used to gather outputs from parallel iterations (e.g., from `dsl.ParallelFor`) into a single list, not a control flow construct for conditional execution.

Practice this question →

57

MCQmedium

A team has a Vertex AI pipeline that includes a container component for data preprocessing. The team notices that the component is re-executed every time the pipeline runs, even when the inputs and code haven't changed. They want to leverage pipeline caching to avoid redundant executions. What should they do to enable caching for this component?

A.Set the 'caching' flag to 'True' in the pipeline definition using 'pipeline.caching = True'.

B.Set the environment variable 'ENABLE_CACHE' to 'true' on the pipeline run request.

C.Re-compile the pipeline with the '--enable-cache' flag.

D.Ensure that the component does not have 'dsl.cache_options(enable_cache=False)' set.

AnswerD

Caching is enabled by default; if someone explicitly disabled it, removing that line will re-enable caching.

Why this answer

Option D is correct because Vertex AI pipeline caching is enabled by default for all components unless explicitly disabled using `dsl.cache_options(enable_cache=False)`. The component re-executing every time indicates that caching was likely disabled on that specific component. Removing or ensuring this setting is not present will allow the pipeline to reuse cached outputs when inputs and code have not changed.

Exam trap

The trap here is that candidates assume caching must be explicitly enabled (like in some other cloud platforms), but Vertex AI caches by default, so the issue is usually that caching was explicitly disabled on the component.

How to eliminate wrong answers

Option A is wrong because Vertex AI pipelines do not have a global `pipeline.caching` attribute; caching is controlled per component via the `@component` decorator or `ContainerComponent` definition, not at the pipeline level. Option B is wrong because there is no environment variable `ENABLE_CACHE` for Vertex AI pipeline runs; caching is configured in the pipeline definition, not via runtime environment variables. Option C is wrong because Vertex AI pipelines do not use a `--enable-cache` compilation flag; caching is a runtime feature controlled by component-level settings, not a compile-time option.

Practice this question →

58

MCQmedium

An engineer needs to compile a Kubeflow Pipeline defined in Python to a JSON format that can be run on Vertex AI Pipelines. Which command should they use?

A.kfp.compiler.Compiler().compile(pipeline_func, 'pipeline.json')

B.gcloud ai pipelines compile command.

C.kfp.Client().upload_pipeline()

D.dsl.pipeline decorator automatically compiles at runtime.

AnswerA

This is the correct command to compile a pipeline function to JSON.

Why this answer

Option A is correct because the Kubeflow Pipelines SDK provides the `kfp.compiler.Compiler().compile()` method to convert a Python-based pipeline function into a JSON or YAML format that is compatible with Vertex AI Pipelines. This JSON representation defines the pipeline's components, dependencies, and execution graph, enabling it to be submitted to Vertex AI for orchestration. The `compile()` method is the standard way to produce a portable pipeline specification from Python code.

Exam trap

The trap here is that candidates may mistakenly believe that the `gcloud ai pipelines` command or the `dsl.pipeline` decorator directly compiles the pipeline, but in Google's Vertex AI Pipelines, you must use the KFP SDK's `Compiler().compile()` method to generate the pipeline JSON specification.

How to eliminate wrong answers

Option B is wrong because `gcloud ai pipelines compile` is not a valid gcloud command; the gcloud CLI for Vertex AI uses `gcloud ai pipelines run` to submit a pre-compiled pipeline, but compilation must be done separately using the KFP SDK. Option C is wrong because `kfp.Client().upload_pipeline()` uploads a compiled pipeline package to a Kubeflow Pipelines instance, but it does not perform the compilation step itself; the pipeline must already be compiled into a JSON or YAML file before uploading. Option D is wrong because the `dsl.pipeline` decorator defines the pipeline structure and components but does not automatically compile it at runtime; explicit invocation of `Compiler().compile()` is required to generate the JSON artifact.

Practice this question →

59

MCQmedium

A data science team wants to build a machine learning pipeline on Vertex AI Pipelines that preprocesses data, trains a model, and evaluates it. They need to ensure that components can be reused across multiple pipelines and that outputs from one component can be passed as inputs to another. Which approach should they take?

A.Write each component as a Cloud Composer DAG task using Python operators and manage dependencies via Airflow.

B.Use Vertex AI pre-built components exclusively and chain them using the Vertex AI SDK without a pipeline definition.

C.Define each step as a separate Cloud Build step and chain them via build triggers.

D.Use Kubeflow Pipelines SDK v2 to create Python function components decorated with @dsl.component and compose them into a pipeline using @dsl.pipeline.

AnswerD

This is the standard approach for reusable, composable ML pipeline components on Vertex AI Pipelines.

Why this answer

Option D is correct because Kubeflow Pipelines SDK v2 with @dsl.component and @dsl.pipeline decorators is the native way to define reusable, composable components in Vertex AI Pipelines. This approach allows each component to be a self-contained Python function that can be independently versioned and reused across multiple pipelines, with outputs automatically serialized and passed as inputs to downstream components via the pipeline graph.

Exam trap

Cisco often tests the misconception that any orchestration tool (Airflow, Cloud Build) can substitute for a purpose-built ML pipeline framework, but the key differentiator is Vertex AI Pipelines' native support for reusable components with typed artifact passing and managed execution.

How to eliminate wrong answers

Option A is wrong because Cloud Composer (Airflow) is a workflow orchestrator for general DAGs, not a purpose-built ML pipeline framework; it lacks native support for Vertex AI Pipelines' artifact tracking, component reuse, and ML-specific I/O handling. Option B is wrong because Vertex AI pre-built components cannot be chained without a pipeline definition; the Vertex AI SDK requires a pipeline specification (e.g., via Kubeflow Pipelines) to define the execution graph and pass outputs between steps. Option C is wrong because Cloud Build is a CI/CD service for building and testing code, not for orchestrating ML pipelines; it does not provide managed artifact passing, caching, or the runtime environment needed for ML training and evaluation steps.

Practice this question →

60

MCQmedium

A machine learning engineer needs to create a pipeline that runs a custom container component on Vertex AI. The container expects a Cloud Storage path as input and outputs a model artifact. Which component type should they define using the Kubeflow Pipelines SDK v2?

A.Google Cloud Pipeline Components (GCPC) for custom containers

B.Python function component using @dsl.component

C.Importer component to load the container as an artifact

D.Container component using @dsl.container_component

AnswerD

Container components allow you to specify a Docker image that runs as a component.

Why this answer

Option D is correct because the Kubeflow Pipelines SDK v2 provides the @dsl.container_component decorator specifically for defining components that wrap custom container images. This allows the engineer to specify the container image, input/output paths (like a Cloud Storage path), and artifact metadata, enabling Vertex AI to execute the container as a pipeline step and capture the model artifact.

Exam trap

The trap here is that candidates confuse the Importer component (which only imports existing artifacts) with a component that runs a container to produce an artifact, or they mistakenly think GCPC can wrap any custom container when it only provides pre-built Google service integrations.

How to eliminate wrong answers

Option A is wrong because Google Cloud Pipeline Components (GCPC) are pre-built components for Google Cloud services (e.g., AI Platform, BigQuery), not for wrapping arbitrary custom containers. Option B is wrong because @dsl.component is used for Python function components that execute inline Python code, not for running a custom container image. Option C is wrong because the Importer component is used to import existing artifacts (like a pre-trained model) into the pipeline's metadata store, not to run a container that produces an artifact.

Practice this question →

61

MCQmedium

A data science team wants to build a Vertex AI pipeline that trains a model, evaluates it, and conditionally deploys it if the accuracy exceeds 0.9. They want to use the Kubeflow Pipelines SDK v2. Which construct allows them to conditionally execute the deployment step based on the evaluation metric?

A.dsl.If

B.dsl.Conditional

C.dsl.ExitHandler

D.dsl.Collected

AnswerA

dsl.If is the correct construct for conditional execution in KFP v2.

Why this answer

In Kubeflow Pipelines SDK v2, `dsl.If` is the correct construct for conditionally executing pipeline steps based on runtime metrics or parameters. It allows you to define a condition that, when evaluated to true, triggers the deployment step only if the model accuracy exceeds 0.9. This is the standard way to implement branching logic in v2 pipelines.

Exam trap

Candidates often confuse `dsl.If` with `dsl.Conditional`, but in Kubeflow Pipelines SDK v2 for Vertex AI, `dsl.If` is the correct construct for conditional execution based on runtime metrics.

How to eliminate wrong answers

Option B is wrong because `dsl.Conditional` is not a valid construct in Kubeflow Pipelines SDK v2; the correct name is `dsl.If`. Option C is wrong because `dsl.ExitHandler` is used to execute cleanup or notification steps when a pipeline or component exits, not for conditional branching based on evaluation metrics. Option D is wrong because `dsl.Collected` is used to gather outputs from parallel tasks (e.g., from a loop or fan-out), not to conditionally execute a step.

Practice this question →

62

MCQeasy

What is the primary benefit of using pipeline caching in Vertex AI Pipelines?

A.It reduces execution time and cost by reusing unchanged component outputs.

B.It encrypts data at rest.

C.It automatically scales the pipeline resources.

D.It enables parallel execution of components.

AnswerA

This is the primary benefit of caching.

Why this answer

Pipeline caching in Vertex AI Pipelines automatically detects when a component's inputs and code have not changed from a previous execution and reuses the cached output artifacts. This avoids redundant computation, directly reducing both execution time and cost by skipping re-execution of unchanged steps.

Exam trap

The exam often tests the distinction between caching (reusing outputs) and parallelization (running components concurrently), so candidates may confuse the two and incorrectly select parallel execution as the primary benefit.

How to eliminate wrong answers

Option B is wrong because encryption at rest is a data security feature managed by Cloud KMS or default Google Cloud encryption, not a benefit of pipeline caching. Option C is wrong because automatic scaling of pipeline resources is handled by Vertex AI's underlying infrastructure (e.g., node auto-scaling) or custom configuration, not by caching. Option D is wrong because parallel execution of components is achieved through pipeline design (e.g., using `dsl.ParallelFor` or independent component dependencies), not through caching; caching can actually reduce the need for parallel execution by reusing results.

Practice this question →

63

MCQmedium

A data science team wants to deploy a ML pipeline on Vertex AI Pipelines that includes a component to train a model using a custom container. The component should be reusable across different pipelines and accept hyperparameters as inputs. Which approach should they take?

A.Create a container component by specifying a container image and input/output artifacts using the Kubeflow Pipelines SDK.

B.Package the training code as a Vertex AI Training custom job and call it from a Python function component.

C.Define a Python function component using @dsl.component and pass hyperparameters as function arguments.

D.Use a pre-built Google Cloud Pipeline Component for custom training and override the image.

AnswerA

Container components allow custom containers and are reusable across pipelines.

Why this answer

Container components are designed for custom containers and can be defined once and reused across pipelines. Python function components are simpler but limited to Python code; they cannot easily encapsulate custom containers.

Practice this question →

64

MCQeasy

You are defining a Python function component in KFP SDK v2. Which decorator should you use?

A.@dsl.task

B.@component

C.@dsl.pipeline

D.@dsl.component

AnswerD

Correct decorator for a component in KFP SDK v2.

Why this answer

In KFP SDK v2, the `@dsl.component` decorator is used to define a Python function as a lightweight, reusable pipeline component that can be executed independently. This decorator automatically generates a containerized component from the function's signature and type annotations, enabling type-safe inputs and outputs without requiring a separate component YAML specification.

Exam trap

The exam often tests the distinction between v1 and v2 decorators, so the trap here is that candidates familiar with KFP SDK v1 may incorrectly choose `@component` (option B) instead of the v2-specific `@dsl.component`.

How to eliminate wrong answers

Option A is wrong because `@dsl.task` is not a valid decorator in KFP SDK v2; tasks are implicitly created when a component is called within a pipeline, not via a decorator. Option B is wrong because `@component` is the decorator from KFP SDK v1 (the older `kfp.components` module) and is not used in v2, which requires the `dsl` namespace. Option C is wrong because `@dsl.pipeline` is used to define a pipeline (a DAG of components), not a single component function.

Practice this question →

65

MCQmedium

A team develops a pipeline that trains a model and evaluates it. They want to pass the test accuracy (a float) from the evaluation component to a subsequent deployment component. Which KFP SDK type should the evaluation component output be annotated with?

A.Output[float]

B.Output[Metrics]

C.Output[Artifact]

D.Output[ClassificationMetrics]

AnswerA

Output[float] defines a pipeline parameter that can be used as input to downstream components.

Why this answer

For simple numeric values like floats, the appropriate output type is a pipeline parameter (e.g., float). KFP artifacts are used for larger data like models or datasets, not for scalar metrics.

Practice this question →

66

MCQhard

A team notices that a Vertex AI Pipeline step re-executes every time the pipeline runs, even though its inputs and code have not changed. They want to enable caching for this component to avoid redundant computation. However, caching is currently disabled globally. Which configuration change will enable caching for that specific component?

A.Use the 'enable_caching' parameter when creating the pipeline job via the SDK.

B.Add 'caching=True' to the @dsl.component decorator for that component.

C.Add 'caching=True' to the @dsl.pipeline decorator.

D.Set the environment variable 'CACHE_ENABLED=True' on the Vertex AI Pipeline job.

AnswerB

Setting caching=True in the component decorator enables caching for that component.

Why this answer

Caching can be enabled per component by setting the 'caching' parameter to True in the component decorator or by setting the environment variable. Setting it in the pipeline decorator affects all components. In KFP SDK v2, you can use @dsl.component(caching=True) or set the task-level caching.

The correct answer is to set caching=True in the component definition.

Practice this question →

67

Multi-Selectmedium

A machine learning team uses Vertex AI Pipelines for model training. They want to implement a conditional step that runs additional evaluation if the model accuracy exceeds 0.9, otherwise it runs a data augmentation component. Which two Kubeflow Pipelines SDK v2 constructs can they use to achieve this? (Choose two.)

Select 2 answers

A.dsl.ParallelFor

B.dsl.Else

C.dsl.ExitHandler

D.dsl.Collected

E.dsl.If

AnswersB, E

dsl.Else defines the branch executed when the condition is false.

Why this answer

In Kubeflow Pipelines SDK v2, `dsl.If` and `dsl.Else` are the constructs used to create conditional execution branches within a pipeline. `dsl.If` evaluates a condition (e.g., model accuracy > 0.9) and runs the enclosed steps only if true; `dsl.Else` defines the alternative branch that runs when the condition is false. This directly implements the required logic for running additional evaluation on high accuracy or data augmentation otherwise.

Exam trap

The trap here is that candidates may confuse `dsl.ParallelFor` or `dsl.ExitHandler` with conditional constructs, but `dsl.If` and `dsl.Else` are the only SDK v2 constructs specifically designed for branching based on runtime conditions.

Practice this question →

68

MCQhard

A company uses Cloud Composer to orchestrate their ML workflows. They have an Airflow DAG that runs a Vertex AI pipeline, then a BigQuery query, then a Dataflow job. The DAG is failing because the Vertex AI pipeline takes longer than the Airflow task timeout. What is the best way to handle this?

A.Increase the Airflow task timeout to account for the maximum expected pipeline duration.

B.Run the pipeline synchronously by using a Kubernetes Pod operator.

C.Split the DAG into two DAGs: one for the pipeline, one for the rest.

D.Use the Airflow Vertex AI pipeline operator with wait_for_completion=True.

AnswerD

The operator can poll the pipeline status, freeing the worker while waiting.

Why this answer

Vertex AI pipelines are asynchronous; the Airflow operator should wait for completion using a sensor or by setting the wait_for_completion parameter. Increasing the task timeout is a workaround but not best practice. Using a separate DAG for the pipeline defeats the purpose of orchestration.

Practice this question →

69

MCQeasy

An ML engineer needs to trigger a Vertex AI Pipeline on a recurring schedule, every 24 hours, to retrain a model with the latest data. Which approach should they use to set up this schedule?

A.Use Cloud Tasks to queue pipeline runs daily.

B.Create a Cloud Scheduler job that calls the Vertex AI API to create a pipeline job.

C.Set a cron expression in the pipeline definition file using the 'schedule' parameter.

D.Use the Vertex AI Pipelines UI to set a schedule directly on the pipeline.

AnswerB

Cloud Scheduler can invoke Vertex AI API via HTTP or Pub/Sub to trigger a pipeline on a schedule.

Why this answer

Cloud Scheduler is the native Google Cloud service for cron-based job scheduling. By configuring a Cloud Scheduler job to call the Vertex AI API (e.g., via a HTTP POST to the projects.locations.pipelineJobs.create endpoint), the engineer can trigger a pipeline run every 24 hours. This approach is reliable, supports authentication via OAuth, and integrates directly with Vertex AI Pipelines without requiring additional orchestration code.

Exam trap

Cisco often tests the misconception that Vertex AI Pipelines has a built-in scheduling feature (like a cron parameter in the pipeline definition or a UI schedule button), when in fact scheduling must be implemented using Cloud Scheduler or similar external services.

How to eliminate wrong answers

Option A is wrong because Cloud Tasks is a distributed task queue designed for asynchronous message delivery and retries, not for recurring cron-based scheduling; it would require an additional scheduler to enqueue tasks daily. Option C is wrong because Vertex AI Pipeline definitions do not support a 'schedule' parameter; scheduling is handled externally, not within the pipeline YAML or JSON definition. Option D is wrong because the Vertex AI Pipelines UI does not provide a built-in recurring schedule feature; schedules must be created using Cloud Scheduler or other external tools.

Practice this question →

70

MCQmedium

A team is implementing CI/CD for ML using Cloud Build. They want to trigger a training pipeline in Vertex AI whenever a new model code is pushed to the main branch of the repository. Which Cloud Build configuration should they use to achieve this?

A.Set up a Cloud Build trigger that runs on push to any branch, and in the build step, use gcloud to submit a Vertex AI Pipeline job.

B.Use a Cloud Scheduler job to periodically check for new commits on main and trigger Cloud Build.

C.Use Cloud Functions to watch the repository and call Cloud Build on push to main.

D.Set up a Cloud Build trigger that runs on push to main branch, and in the build step, use gcloud to submit a Vertex AI Pipeline job.

AnswerD

This correctly limits the trigger to the main branch and uses gcloud to launch the pipeline.

Why this answer

Option D is correct because Cloud Build triggers can be configured to fire specifically on pushes to the main branch. The build step then uses the gcloud command to submit a Vertex AI Pipeline job, which directly integrates the CI/CD pipeline with Vertex AI's orchestration. This approach is event-driven, immediate, and requires no additional services or polling.

Exam trap

Cisco often tests the candidate's understanding that Cloud Build triggers can be scoped to specific branches and that using gcloud directly in a build step is the simplest and most efficient way to invoke Vertex AI Pipelines, rather than introducing unnecessary intermediate services like Cloud Functions or Scheduler.

How to eliminate wrong answers

Option A is wrong because triggering on push to any branch would cause the pipeline to run on feature branches, pull requests, and other non-main branches, leading to unnecessary executions and potential conflicts. Option B is wrong because Cloud Scheduler polling is inefficient, introduces latency, and is not the intended event-driven mechanism; Cloud Build triggers are designed to react to repository events directly. Option C is wrong because using Cloud Functions as an intermediary adds unnecessary complexity and cost; Cloud Build natively supports repository event triggers without requiring a separate compute service.

Practice this question →

71

Multi-Selectmedium

You are building a CI/CD pipeline for an ML model using Cloud Build. When code is pushed to the main branch, you want to automatically build a training image, run a Vertex AI pipeline, and if the model evaluation passes, deploy it to a staging endpoint. Which two components are essential for this CI/CD pipeline?

Select 2 answers

A.Cloud Scheduler to trigger the pipeline on a schedule.

B.Cloud Functions to deploy the model.

C.Vertex AI Pipelines to orchestrate training and evaluation.

D.Cloud Build trigger configured to respond to push events to the main branch.

E.Vertex AI Continuous Training service.

AnswersC, D

Pipelines are used to run the ML workflow.

Why this answer

Option C is correct because Vertex AI Pipelines is the service that orchestrates the entire ML workflow, including training, evaluation, and conditional deployment logic. In a CI/CD pipeline, you need a managed orchestrator to chain steps like model training, evaluation, and deployment, and Vertex AI Pipelines provides that with its Kubeflow Pipelines-based DAG execution.

Exam trap

Google often tests the distinction between event-driven triggers (Cloud Build trigger on push) and schedule-based triggers (Cloud Scheduler), so candidates mistakenly pick Cloud Scheduler when the requirement is for a code-push event.

Practice this question →

72

MCQmedium

An ML engineer is building a pipeline on Vertex AI Pipelines and wants to pass a dataset artifact from one component to another without incurring additional cost for intermediate storage. How should they define the input and output types?

A.Store the data in BigQuery and pass the table reference.

B.Define output as a string and pass the GCS path manually.

C.Use the Dataset artifact type available in the KFP SDK for inputs and outputs.

D.Use in-memory Python objects as function return values.

AnswerC

Artifact types like Dataset are designed for passing data via GCS URIs.

Why this answer

Option C is correct because using the KFP SDK's Dataset artifact type allows Vertex AI Pipelines to manage the data as a lineage-tracked artifact, automatically handling the underlying GCS storage reference without incurring additional intermediate storage costs. The artifact type enables the pipeline to pass the metadata (URI, type, etc.) between components efficiently, leveraging the native artifact management of the KFP SDK.

Exam trap

The trap is that candidates often think 'avoiding additional cost' means 'avoiding any storage,' leading them to choose in-memory passing (Option D) or manual string paths (Option B). In reality, Vertex AI Pipelines uses the same underlying GCS storage that is already part of the pipeline's infrastructure, so the KFP Dataset artifact type incurs no extra cost while enabling lineage tracking.

How to eliminate wrong answers

Option A is wrong because storing data in BigQuery and passing a table reference incurs BigQuery storage and query costs, and it introduces an external dependency that is not necessary for intermediate data passing within a pipeline. Option B is wrong because passing a GCS path as a string manually bypasses the artifact tracking and lineage capabilities of KFP, leading to potential issues with reproducibility and cost management, though it does not directly incur additional storage cost. Option D is wrong because in-memory Python objects cannot be passed between pipeline components that run in separate containers or environments; they would require serialization and storage, defeating the purpose of avoiding intermediate storage costs.

Practice this question →

73

MCQmedium

An organization wants to trigger a Vertex AI pipeline whenever a new commit is pushed to the main branch of their Cloud Source Repository. The pipeline should retrain and evaluate the model. Which service should they use to detect the push event and start the pipeline?

A.Vertex AI Pipeline schedule

B.Cloud Build trigger

C.Cloud Scheduler on a short interval

D.Pub/Sub with push subscription to a Cloud Function

AnswerB

Cloud Build triggers directly respond to push events and can invoke pipelines.

Why this answer

Cloud Build triggers are designed to automatically invoke a build pipeline in response to events from Cloud Source Repository, such as a push to a specific branch. This allows the organization to directly start a Vertex AI pipeline for retraining and evaluation without additional infrastructure. Cloud Build triggers natively integrate with Cloud Source Repository, making them the simplest and most reliable choice for this event-driven workflow.

Exam trap

This question tests the distinction between event-driven triggers (Cloud Build) and time-based schedulers (Cloud Scheduler, Vertex AI Pipeline schedule), leading candidates to mistakenly choose a polling or cron-based option for an event-driven requirement.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipeline schedules are time-based (cron) triggers, not event-driven; they cannot detect a Git push event. Option C is wrong because Cloud Scheduler on a short interval would poll for changes, introducing latency and inefficiency, and it does not natively detect push events from Cloud Source Repository. Option D is wrong because while Pub/Sub with a push subscription to a Cloud Function could technically work, it adds unnecessary complexity and an extra compute layer; Cloud Build triggers provide a direct, managed integration without the need for custom code or additional services.

Practice this question →

74

Multi-Selecteasy

An ML engineer is creating a Vertex AI Pipeline that includes a loop to train multiple models in parallel on different hyperparameter sets. Which TWO KFP SDK v2 constructs can be used to implement this parallel execution?

Select 2 answers

A.dsl.If

B.dsl.Parallel

C.A for loop in the pipeline function that creates multiple tasks

D.dsl.Pipeline

E.dsl.Collected

AnswersC, E

You can use a Python for loop to generate multiple task invocations, effectively creating parallel tasks.

Why this answer

Option C is correct because in KFP SDK v2, you can use a Python for loop inside the pipeline function to dynamically create multiple task instances, each with different hyperparameter sets. These tasks are then executed in parallel by the Vertex AI Pipelines orchestrator, as long as there are no data dependencies between them. This pattern leverages the SDK's ability to compile Python control flow into a directed acyclic graph (DAG) of pipeline steps.

Exam trap

Cisco often tests the misconception that a dedicated 'Parallel' construct exists in KFP SDK v2, when in fact parallel execution is achieved through Python loops or the dsl.ParallelFor component, and candidates may confuse dsl.ParallelFor with the nonexistent dsl.Parallel.

Practice this question →

75

MCQeasy

A machine learning engineer is building a pipeline with Vertex AI Pipelines and wants to pass a large dataset between components without copying it to the container's memory. What is the best practice for passing data between pipeline components?

A.Mount an NFS volume to all containers and share data via the filesystem.

B.Use Cloud Storage URIs (gs://) to point to the data location.

C.Serialize the dataset to JSON and include it as a pipeline parameter.

D.Use the importer component to load the data into the pipeline as an in-memory artifact.

AnswerB

Using GCS URIs allows components to read/write data efficiently and supports caching.

Why this answer

Option B is correct because Vertex AI Pipelines natively supports passing Cloud Storage URIs (gs://) as artifact references between components, allowing components to read the dataset directly from GCS without copying it into container memory. This avoids memory limits and enables efficient handling of large datasets by leveraging GCS's scalable object storage.

Exam trap

A common mistake in this exam is to think that large data must be passed as in-memory artifacts or serialized parameters, when the correct Vertex AI Pipelines pattern is to pass a Cloud Storage URI and let components read data lazily from GCS.

How to eliminate wrong answers

Option A is wrong because mounting an NFS volume introduces network filesystem latency, requires additional infrastructure setup, and is not a native or recommended pattern in Vertex AI Pipelines, which is designed for serverless, cloud-native artifact passing. Option C is wrong because serializing a large dataset to JSON and including it as a pipeline parameter would exceed the maximum parameter size limit (typically 512 KB in Vertex AI Pipelines) and would force the entire dataset into memory, defeating the purpose of avoiding memory copies. Option D is wrong because the importer component registers an external artifact (e.g., a GCS URI) into the pipeline's metadata store but does not load data into memory; the misconception is that it creates an in-memory artifact, whereas it merely creates a metadata reference.

Practice this question →

Page 1 of 2 · 89 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Pmle Ml Pipelines questions.

Start 20-question session

CCNA Pmle Ml Pipelines Questions — Page 1 of 2 | Courseiva