CCNA Pmle Ml Pipelines Questions — Page 2 of 2

MCQhard

You are building a Vertex AI pipeline using the KFP SDK v2. One component processes a large dataset and outputs a metrics artifact. You notice that the component is being cached even when the dataset changes, because the component code and image remain the same. How can you force the component to always re-execute when the dataset changes?

A.Use the dsl.CacheKey annotation to explicitly set cache keys.

B.Set caching_strategy.max_cache_staleness = "0s" on the component.

C.Change the component's image tag to :latest.

D.Add a random integer parameter to the component to vary the inputs.

AnswerB

Setting max_cache_staleness to 0s disables caching for that component, forcing re-execution.

Why this answer

In KFP SDK v2, caching is keyed on the component code, image digest, and all input values. If the dataset is passed as a URI string, changing the URI will invalidate the cache. If the dataset changes without a URI change, you can disable caching per component by setting caching_strategy.max_cache_staleness to 0 or using the disable_cache method.

Practice this question →

MCQmedium

A team wants to implement continuous delivery for their ML models. They have a pipeline that trains a model and evaluates it. If the evaluation metrics exceed a threshold, the model should be deployed to a staging endpoint, and after manual approval, to production. Which approach should they use?

A.Use Cloud Build to orchestrate the whole process, with a manual approval step before deploying to production.

B.Use Vertex AI Pipelines to deploy to production directly after evaluation.

C.Use Cloud Scheduler to trigger deployment every hour.

D.Use Cloud Functions to deploy after evaluation without approval.

AnswerA

Cloud Build supports manual approval gates, making it suitable for CD.

Why this answer

Option A is correct because Cloud Build supports manual approval steps via its 'approval' configuration in the build YAML, allowing the team to gate the production deployment after staging evaluation. This aligns with the requirement for continuous delivery (not deployment) where a human-in-the-loop approves the final production rollout. Vertex AI Pipelines lacks native manual approval gating, and the other options bypass the required manual approval step entirely.

Exam trap

The trap here is confusing continuous delivery (which includes a manual approval gate) with continuous deployment (which is fully automated), leading candidates to choose options that skip the required human approval step.

How to eliminate wrong answers

Option B is wrong because Vertex AI Pipelines does not have a built-in manual approval step; deploying directly to production after evaluation violates the requirement for manual approval. Option C is wrong because Cloud Scheduler triggers deployments on a fixed schedule (every hour), not based on evaluation metrics exceeding a threshold, and it lacks the manual approval gate. Option D is wrong because Cloud Functions would deploy automatically after evaluation without any manual approval step, contradicting the explicit requirement for human approval before production deployment.

Practice this question →

MCQhard

A team is building a continuous training pipeline that retrains a model when new data arrives. They want to detect data drift between the training dataset and the serving data. Which approach should they integrate into the pipeline to compare the distributions of the two datasets?

A.Export metrics to Cloud Monitoring and set up alerting on mean values.

B.Use Vertex AI Model Monitoring with skew detection enabled.

C.Use Cloud DLP to inspect the datasets and generate summary statistics.

D.Compute histograms of features in BigQuery ML and compare them manually.

AnswerB

Model Monitoring provides built-in skew and drift detection by comparing distributions.

Why this answer

Vertex AI Model Monitoring with skew detection enabled is the correct approach because it is specifically designed to detect data drift between training and serving datasets in a continuous training pipeline. It automatically computes distribution statistics (e.g., using Jensen-Shannon divergence or L-infinity distance) for each feature and compares the training data distribution against the serving data distribution, triggering alerts when drift exceeds a configured threshold. This integrates natively with Vertex AI Pipelines, enabling automated retraining workflows.

Exam trap

Cisco often tests the distinction between monitoring for data quality (e.g., missing values, outliers) versus monitoring for distribution drift, and candidates mistakenly choose Cloud Monitoring or manual histogram comparison because they think any metric or visualization can detect drift, but only dedicated drift detection services like Vertex AI Model Monitoring provide the necessary statistical tests and automated alerting.

How to eliminate wrong answers

Option A is wrong because exporting metrics to Cloud Monitoring and setting up alerting on mean values only monitors a single statistic (the mean) and cannot detect complex distribution shifts like changes in variance, skewness, or multimodal distributions; it also lacks the statistical tests needed for rigorous drift detection. Option C is wrong because Cloud DLP is a data loss prevention service focused on inspecting and redacting sensitive data (e.g., PII, credit card numbers), not on comparing statistical distributions of datasets for drift detection. Option D is wrong because computing histograms in BigQuery ML and comparing them manually is not automated, does not scale to many features, and lacks built-in statistical significance tests (e.g., Kolmogorov-Smirnov test) that Vertex AI Model Monitoring provides out-of-the-box.

Practice this question →

MCQhard

You are designing a Vertex AI pipeline that includes a container component. The component needs to use a custom container image that is stored in Artifact Registry. How should you specify the container image in the component definition?

A.Use ContainerOp class from kfp.v2.dsl.

B.Use the @dsl.container_component decorator and set the image parameter to the URI.

C.Use a placeholder in the pipeline YAML.

D.Use the @dsl.component decorator and set the base_image parameter.

AnswerB

This is the correct way to define a container component and specify the image URI.

Why this answer

Option B is correct because the `@dsl.container_component` decorator is specifically designed for defining container components in Vertex AI pipelines. The `image` parameter accepts the full Artifact Registry URI (e.g., `us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag`), allowing the pipeline to pull the custom container from Artifact Registry at runtime. This decorator also requires you to define inputs and outputs explicitly, ensuring the component is properly integrated into the pipeline graph.

Exam trap

The trap here is that candidates confuse the `@dsl.component` decorator (for Python functions) with the `@dsl.container_component` decorator (for custom containers), mistakenly thinking `base_image` can point to a custom container image in Artifact Registry.

How to eliminate wrong answers

Option A is wrong because `ContainerOp` from `kfp.v2.dsl` is a legacy class used in the older Kubeflow Pipelines SDK (v1), not the recommended approach for Vertex AI pipelines; it does not support the `@dsl.container_component` decorator pattern and may cause compatibility issues. Option C is wrong because using a placeholder in the pipeline YAML is not a valid method to specify the container image; the image URI must be provided programmatically in the component definition, not as a YAML placeholder. Option D is wrong because the `@dsl.component` decorator is for Python function-based components, not container components; its `base_image` parameter specifies a base image for the Python environment, not a custom container image from Artifact Registry.

Practice this question →

MCQmedium

A company uses Cloud Composer to orchestrate a nightly ML workflow that includes running a Vertex AI pipeline, querying BigQuery, and running a Dataflow job. The Airflow DAG must run only if the previous day's Dataflow job succeeded. Which Airflow concept should they use to implement this dependency?

A.Use a BranchPythonOperator to check the status of the Dataflow job before proceeding.

B.Nest the tasks in a SubDAG with a schedule_interval that starts after the expected Dataflow completion time.

C.Set a TriggerRule on the Vertex AI pipeline task to 'all_done' and reference the previous task.

D.Use the bitshift operators (>>) to set the execution order: Dataflow_task >> VertexAI_pipeline.

AnswerD

The >> operator sets a direct dependency: VertexAI_pipeline runs only after Dataflow_task succeeds.

Why this answer

Option D is correct because Airflow's bitshift operators (>>) define task dependencies in a DAG. By setting `Dataflow_task >> VertexAI_pipeline`, the Vertex AI pipeline task will only execute after the Dataflow task has completed successfully. This directly enforces the required dependency without additional logic or branching.

Exam trap

Cisco often tests whether candidates understand that Airflow's default task dependency behavior (via bitshift operators) inherently enforces success-based execution, making explicit branching or trigger rule modifications unnecessary for simple sequential dependencies.

How to eliminate wrong answers

Option A is wrong because BranchPythonOperator is used for conditional branching within a DAG, not for enforcing a simple sequential dependency; it would unnecessarily complicate the workflow. Option B is wrong because SubDAGs are used for grouping tasks and do not inherently check the success status of external tasks; using a schedule_interval to start after expected completion time does not guarantee the previous day's Dataflow job succeeded. Option C is wrong because setting a TriggerRule to 'all_done' would cause the Vertex AI pipeline to run regardless of the Dataflow task's success (including failure or skipped states), which does not enforce the required success-only dependency.

Practice this question →

MCQeasy

A data scientist wants to define a lightweight Python function component in Vertex AI Pipelines using Kubeflow Pipelines SDK v2. Which decorator should be applied to the function to make it a pipeline component?

A.@dsl.pipeline

B.@kfp.v2.components.func_to_component

C.@dsl.component

D.@kfp.dsl.component

AnswerC

Correct: @dsl.component turns a Python function into a pipeline component.

Why this answer

In KFP SDK v2, the @dsl.component decorator is used to define a Python function component. @dsl.pipeline is for defining a pipeline that composes multiple components. The other options are not valid decorators.

Practice this question →

MCQhard

A team is using Vertex AI Pipelines to deploy a model. They have a component that evaluates the model and produces a ClassificationMetrics artifact. The pipeline should deploy the model only if the precision is greater than 0.9. They use dsl.If to check the metric. However, the condition always evaluates to False. What is the most likely cause?

A.The precision value is stored as a float but the condition expects a string.

B.The evaluation component did not output the metric correctly.

C.The ClassificationMetrics artifact is not accessible in the condition context.

D.The dsl.If block is placed incorrectly in the pipeline definition.

AnswerC

Correct: Conditions cannot directly read artifact properties; the metric value must be extracted as a pipeline parameter before the condition.

Why this answer

Option C is correct because in Vertex AI Pipelines, `ClassificationMetrics` artifacts are not directly accessible as primitive values within the `dsl.If` condition context. The `dsl.If` condition can only evaluate pipeline parameters or primitive outputs (like strings, integers, floats) that are explicitly passed as pipeline-level parameters or task outputs. A `ClassificationMetrics` artifact is a complex object that must be parsed or have its specific metric values extracted (e.g., via a custom component or `dsl.Metrics`) before they can be used in a conditional check.

Exam trap

In Google PMLE exams, a common trap is forgetting that artifact types like ClassificationMetrics are not directly usable in dsl.If conditions; candidates often assume any output can be used without extracting primitive values.

How to eliminate wrong answers

Option A is wrong because the condition in `dsl.If` can compare floats directly; the precision value being a float does not cause the condition to always evaluate to False. Option B is wrong because the question states the evaluation component produces a `ClassificationMetrics` artifact, implying the output is correct; the issue is not with the component's output but with how that output is accessed in the condition. Option D is wrong because the placement of the `dsl.If` block in the pipeline definition does not affect its ability to evaluate conditions; the condition fails due to the data type of the input, not its position.

Practice this question →

MCQmedium

A company uses Vertex AI Pipelines to train models on a daily schedule. The pipeline includes a component that runs a BigQuery query to extract features. The team wants to ensure that if the BigQuery component fails due to transient network errors, the pipeline automatically retries it. How can they configure retries in Vertex AI Pipelines?

A.Deploy the component as a Cloud Function and configure Cloud Functions retry.

B.Wrap the component in a `dsl.If` conditional that checks for failure and re-submits the component.

C.Use Cloud Composer with a task retry policy in Airflow.

D.Set the `retry` parameter of the component to a positive integer, for example `retry=3`.

AnswerD

The `retry` parameter in the component decorator or constructor enables automatic retries.

Why this answer

Option D is correct because Vertex AI Pipelines natively supports a `retry` parameter on pipeline components. Setting `retry=3` instructs the pipeline to automatically retry the component up to three times if it fails due to transient errors, such as network timeouts. This is the simplest and most direct way to handle retries within the Vertex AI Pipelines orchestration framework.

Exam trap

The trap here is that candidates may confuse Vertex AI Pipelines' native `retry` parameter with external retry mechanisms (Cloud Functions, Airflow) or misuse pipeline control flow constructs like `dsl.If` for retry logic, when the correct approach is a simple parameter on the component definition.

How to eliminate wrong answers

Option A is wrong because deploying the component as a Cloud Function and configuring Cloud Functions retry would move the execution outside of Vertex AI Pipelines, breaking the pipeline's orchestration and monitoring. Option B is wrong because `dsl.If` conditionals are used for conditional execution of components, not for retrying a failed component; they cannot re-submit a component that has already failed. Option C is wrong because Cloud Composer with Airflow is a separate orchestration service that would require migrating the entire pipeline out of Vertex AI Pipelines, adding unnecessary complexity and cost.

Practice this question →

Multi-Selecthard

A pipeline includes a component that produces a model artifact. The team wants to automatically detect skew between the training data distribution and the serving data distribution. Which three best practices should they implement? (Choose three.)

Select 3 answers

A.Compare statistics using a dedicated component and alert on threshold exceedance

B.Use in-memory data passing for efficiency

C.Compute serving data statistics using a component

D.Disable caching to ensure fresh statistics

E.Pass training data statistics as a Dataset artifact

AnswersA, C, E

A comparison component can detect skew and trigger alerts.

Why this answer

To detect skew, one should pass training and serving data statistics as artifacts, compare them using a statistics comparison component, and set up an alert if skew exceeds a threshold. Using GCS URIs for passing data is a general best practice for idempotency.

Practice this question →

MCQmedium

You have a Vertex AI pipeline that trains a model and outputs a Model artifact. You want to register this model in the Vertex AI Model Registry. Which pre-built Google Cloud Pipeline Components component should you use?

A.VertexEndpointDeployOp

B.CreateModelVersionsOp

C.ModelRegisterOp

D.VertexModelUploadOp

AnswerD

This component uploads a model to the Vertex AI Model Registry.

Why this answer

The correct component is VertexModelUploadOp because it is specifically designed to upload a trained model artifact to the Vertex AI Model Registry, creating a new model version or a new model if one does not exist. This component takes the model artifact from a pipeline step and registers it, making it available for deployment or version management.

Exam trap

Google Cloud often tests the distinction between model registration and deployment, so candidates mistakenly choose VertexEndpointDeployOp thinking it registers the model, when in fact it only deploys an already-registered model to an endpoint.

How to eliminate wrong answers

Option A is wrong because VertexEndpointDeployOp is used to deploy a model to an endpoint, not to register a model in the Model Registry. Option B is wrong because CreateModelVersionsOp is not a pre-built Google Cloud Pipeline Components component; the correct component for creating model versions is VertexModelUploadOp. Option C is wrong because ModelRegisterOp does not exist as a pre-built component in the Google Cloud Pipeline Components suite.

Practice this question →

MCQeasy

A machine learning engineer needs to schedule a Vertex AI pipeline to run daily at midnight. Which approach should they use?

A.Use the Vertex AI Pipelines console to set a cron schedule directly on the pipeline.

B.Create a Cloud Build trigger that runs the pipeline on a schedule.

C.Use Cloud Tasks to create a recurring task that invokes the pipeline.

D.Create a Cloud Function that calls the Vertex AI API, triggered by a Pub/Sub message from Cloud Scheduler.

AnswerD

This is the recommended pattern: Cloud Scheduler publishes to Pub/Sub, which triggers a Cloud Function that starts the pipeline.

Why this answer

Option D is correct because Vertex AI Pipelines does not natively support cron scheduling. The recommended pattern is to use Cloud Scheduler to publish a message to a Pub/Sub topic at the desired time, which then triggers a Cloud Function that calls the Vertex AI API to create and run the pipeline. This decoupled architecture ensures reliable scheduling and allows for custom logic before invocation.

Exam trap

The trap here is that candidates assume Vertex AI Pipelines has built-in scheduling, but the exam tests knowledge of the correct Google Cloud integration pattern using Cloud Scheduler, Pub/Sub, and Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines console does not provide a direct cron scheduling interface; you must use an external scheduler. Option B is wrong because Cloud Build triggers are designed for CI/CD events (e.g., code pushes) and are not intended for recurring pipeline execution; they lack the precise time-based scheduling needed for daily runs. Option C is wrong because Cloud Tasks is built for single or delayed task execution, not recurring schedules; it would require additional orchestration to mimic a cron job, making it less suitable than Cloud Scheduler.

Practice this question →

MCQhard

An ML pipeline runs on Vertex AI and includes a component that uses a third-party library not available in the default Python environment. The team wants to avoid building a custom container image. Which approach should they use?

A.Install the library using pip in the pipeline definition

B.Use a container component with a pre-built image

C.Use the packages_to_install parameter in @dsl.component

D.Add the library to the Vertex AI custom training image

AnswerC

This parameter allows specifying extra packages to install in the component's execution environment.

Why this answer

Option C is correct because the `packages_to_install` parameter in the `@dsl.component` decorator allows you to specify a list of third-party Python packages (e.g., via pip) that will be installed at runtime in the component's execution environment, without needing to build a custom container image. This is the recommended approach in Vertex AI Pipelines when you need to use a library not present in the default Python environment, as it avoids the overhead of custom container creation while ensuring the dependency is available for that specific component.

Exam trap

The trap here is that candidates often confuse the `packages_to_install` parameter in Vertex AI's `@dsl.component` with a generic pip install in the pipeline definition, or they assume a pre-built container image avoids custom image building—but in Vertex AI, any container image that includes the library must be custom-built or selected from a registry, which still involves image management overhead. The `packages_to_install` parameter is the native Vertex AI way to install packages without custom containers.

How to eliminate wrong answers

Option A is wrong because `pip install` in the pipeline definition (e.g., in a Python function or YAML) is not a supported mechanism in Vertex AI Pipelines; the pipeline definition itself does not execute shell commands, and dependencies must be declared via the component decorator. Option B is wrong because using a container component with a pre-built image still requires building a custom container image (even if it's pre-built, you must create or select one that includes the library), which contradicts the requirement to avoid building a custom container image. Option D is wrong because adding the library to the Vertex AI custom training image involves creating a custom container image for training, which is a separate process from pipeline components and also requires building a custom image, violating the constraint.

Practice this question →

MCQhard

A team is building a CI/CD pipeline for an ML model. They want to automatically trigger a Vertex AI pipeline for retraining whenever new training data arrives in a Cloud Storage bucket, but only if a specific Pub/Sub notification is published by a data ingestion process. Which approach meets these requirements with minimal operational overhead?

A.Use Cloud Scheduler to run a job every hour that checks for new files in Cloud Storage and starts the pipeline if new files exist.

B.Configure a Cloud Build trigger that listens to the Pub/Sub topic and executes a build step that submits the pipeline run.

C.Use Eventarc to route the Pub/Sub notification to a Cloud Function that calls the Vertex AI pipeline creation API.

D.Create a Dataflow streaming pipeline that reads from Pub/Sub and triggers the Vertex AI pipeline via a custom sink.

AnswerC

Eventarc provides a serverless event-driven integration; Cloud Function handles the trigger with minimal overhead.

Why this answer

Option C is correct because Eventarc can directly listen to a Pub/Sub topic and route matching messages to a Cloud Function, which then calls the Vertex AI pipeline creation API. This serverless approach triggers the pipeline only when the specific Pub/Sub notification is published, meeting the requirement with zero infrastructure to manage and no polling overhead.

Exam trap

The trap here is that candidates may over-engineer the solution by choosing Dataflow (Option D) because it sounds 'streaming' and 'real-time', but the simplest serverless event-driven approach (Eventarc + Cloud Function) meets the requirement with minimal operational overhead.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler polling every hour introduces latency (up to 1 hour) and does not respond to the Pub/Sub notification; it also requires managing a scheduled job and checking for new files, which adds operational overhead and may miss the specific trigger condition. Option B is wrong because Cloud Build triggers are designed for source code changes (e.g., Git commits) and cannot directly listen to a Pub/Sub topic for arbitrary messages; even if configured with a Pub/Sub trigger, Cloud Build is intended for building containers, not for orchestrating ML pipeline runs, and would require extra steps to invoke Vertex AI. Option D is wrong because a Dataflow streaming pipeline is overkill for this simple event-driven trigger; it introduces a persistent streaming job with associated cost and complexity, whereas a lightweight Cloud Function is sufficient and more cost-effective.

Practice this question →

MCQeasy

A machine learning engineer wants to define a lightweight pipeline component that runs custom Python code without building a container image. Which KFP SDK feature should they use?

A.Importer component

B.Python function component with @dsl.component

C.Container component

D.Vertex AI Training job

AnswerB

This decorator turns a Python function into a pipeline component without requiring a custom container.

Why this answer

The `@dsl.component` decorator in KFP SDK allows you to define a lightweight Python function component that runs custom code without requiring a container image. It automatically generates a container specification from the function's dependencies, making it ideal for simple, non-containerized pipeline steps.

Exam trap

The trap here is that candidates may confuse 'lightweight' with 'no container at all,' but KFP always runs components in containers; the `@dsl.component` feature automates container creation, not eliminates it.

How to eliminate wrong answers

Option A is wrong because the Importer component is used to import existing artifacts (like datasets or models) into a pipeline, not to run custom Python code. Option C is wrong because a Container component requires you to specify a pre-built container image, which contradicts the requirement of not building a container image. Option D is wrong because Vertex AI Training job is a managed service for running training jobs on Vertex AI, not a lightweight KFP SDK feature for running custom code without containers.

Practice this question →