Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 901–975

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 13 of 14

901

MCQhard

A data scientist is fine-tuning a large language model from Hugging Face using Vertex AI Training with a GPU. The model has 7 billion parameters and does not fit on a single GPU. They need to split the model across multiple GPUs and train with data parallelism. Which strategy should they use?

A.Use Vertex AI's AutoML to automatically distribute the model.

B.Use pipeline parallelism via a custom container with DeepSpeed and data parallelism across workers using PyTorch DDP, configured with Vertex AI distributed training.

C.Use Vertex AI's hyperparameter tuning with multiple trials.

D.Configure a multi-worker mirrored strategy with TensorFlow, setting TF_CONFIG to use all GPUs on each worker.

AnswerB

This combines model and data parallelism, suitable for large models.

Why this answer

Option B is correct because it combines pipeline parallelism (via DeepSpeed) to split the 7B-parameter model across multiple GPUs, with data parallelism (via PyTorch DDP) to replicate the model across workers for training on larger batches. Vertex AI distributed training coordinates the multi-worker setup, making this the only viable strategy for a model that exceeds single-GPU memory while requiring data parallelism.

Exam trap

The trap here is that candidates confuse 'data parallelism' (which replicates the model) with 'model parallelism' (which splits the model), and assume a single strategy like DDP or mirrored strategy suffices, ignoring that the model must first be partitioned across GPUs using pipeline or tensor parallelism before data parallelism can be applied.

How to eliminate wrong answers

Option A is wrong because Vertex AI AutoML is a no-code automated ML service that does not support custom model architectures or manual distribution of large language models; it cannot handle a 7B-parameter model that requires custom parallelism strategies. Option C is wrong because hyperparameter tuning optimizes training hyperparameters (e.g., learning rate) across multiple trials, but does not address the fundamental need to split a model across GPUs or enable data parallelism. Option D is wrong because a multi-worker mirrored strategy with TensorFlow requires the model to fit on a single GPU per worker (it mirrors the entire model), and the 7B-parameter model exceeds that limit; additionally, TF_CONFIG setup does not provide pipeline parallelism to split the model across devices.

Full explanation →

902

MCQhard

You are designing a Vertex AI pipeline that includes a container component. The component needs to use a custom container image that is stored in Artifact Registry. How should you specify the container image in the component definition?

A.Use ContainerOp class from kfp.v2.dsl.

B.Use the @dsl.container_component decorator and set the image parameter to the URI.

C.Use a placeholder in the pipeline YAML.

D.Use the @dsl.component decorator and set the base_image parameter.

AnswerB

This is the correct way to define a container component and specify the image URI.

Why this answer

Option B is correct because the `@dsl.container_component` decorator is specifically designed for defining container components in Vertex AI pipelines. The `image` parameter accepts the full Artifact Registry URI (e.g., `us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag`), allowing the pipeline to pull the custom container from Artifact Registry at runtime. This decorator also requires you to define inputs and outputs explicitly, ensuring the component is properly integrated into the pipeline graph.

Exam trap

The trap here is that candidates confuse the `@dsl.component` decorator (for Python functions) with the `@dsl.container_component` decorator (for custom containers), mistakenly thinking `base_image` can point to a custom container image in Artifact Registry.

How to eliminate wrong answers

Option A is wrong because `ContainerOp` from `kfp.v2.dsl` is a legacy class used in the older Kubeflow Pipelines SDK (v1), not the recommended approach for Vertex AI pipelines; it does not support the `@dsl.container_component` decorator pattern and may cause compatibility issues. Option C is wrong because using a placeholder in the pipeline YAML is not a valid method to specify the container image; the image URI must be provided programmatically in the component definition, not as a YAML placeholder. Option D is wrong because the `@dsl.component` decorator is for Python function-based components, not container components; its `base_image` parameter specifies a base image for the Python environment, not a custom container image from Artifact Registry.

Full explanation →

903

MCQeasy

A data scientist wants to share a trained model with colleagues for evaluation. The model is stored as a Vertex AI Model resource. What is the recommended way to share the model without exposing the underlying project?

A.Share the model ID and grant colleagues the 'vertex.ai.models.get' permission.

B.Create a new project and copy the model.

C.Upload the model to a public Cloud Storage bucket.

D.Export the model artifact and email it.

AnswerA

This provides secure, traceable access without exposing the project.

Why this answer

Option A is correct because Vertex AI Model resources are managed within a single Google Cloud project, and the recommended way to share a model without exposing the underlying project is to grant the IAM role 'roles/aiplatform.user' or the specific permission 'vertex.ai.models.get' to the colleagues' Google accounts. This allows them to access the model via the model ID (a fully qualified resource name like 'projects/{project}/locations/{region}/models/{model}') without needing to copy or expose the project's infrastructure or credentials.

Exam trap

Google Cloud often tests the misconception that sharing a model requires copying or exporting the artifact, when in fact IAM-based access control on the managed resource is the secure and recommended approach.

How to eliminate wrong answers

Option B is wrong because creating a new project and copying the model is unnecessary overhead and still exposes the model artifact to another project, which does not inherently prevent exposure of the original project's identity; it also violates the principle of least privilege by duplicating resources. Option C is wrong because uploading the model to a public Cloud Storage bucket would expose the model artifact to the entire internet, violating security best practices and potentially leaking proprietary data; it also bypasses Vertex AI's access control mechanisms. Option D is wrong because exporting the model artifact and emailing it is insecure, as email is not encrypted at rest by default and exposes the model to unauthorized interception; it also loses the managed model resource's metadata and versioning.

Full explanation →

904

MCQeasy

Refer to the exhibit. A team deploys a model using Cloud Run. They notice that after scaling up, the new instances take about 90 seconds to become ready and serve requests. They want to reduce this startup time. Which configuration change is most likely to help?

A.Reduce the startupProbe initialDelaySeconds to 30

B.Change the container image to use a smaller base image

C.Reduce the memory limit to 4Gi

D.Increase the containerConcurrency to 100

AnswerB

A smaller base image reduces download and extraction time, speeding up startup.

Why this answer

Option D is correct. Using a smaller container image (e.g., a minimal base image) reduces pull and initialization time, directly lowering startup latency. Option A increases concurrency but doesn't affect startup.

Option B reduces the probe delay but the instance may not be ready earlier. Option C reduces memory but could cause OOM if model requires more.

Full explanation →

905

MCQmedium

A retail company wants to generate product recommendations on their website using Google Cloud. They have historical transaction data and need a managed service that provides personalized recommendations like 'frequently bought together'. Which service should they use?

A.Recommendations AI

B.BigQuery ML

C.Vertex AI Prediction

D.AutoML Tables

AnswerA

Why this answer

Recommendations AI offers pre-built retail models including frequently-bought-together. AutoML Tables would require custom training, BigQuery ML is for SQL-based models, and Vertex AI Prediction is for deploying custom models.

Full explanation →

906

MCQmedium

An organisation uses Delta Lake on Dataproc to manage a data lake for ML training. They need ACID transactions for concurrent reads and writes. Which file format does Delta Lake use as the underlying storage?

A.Apache Parquet

B.Apache ORC

C.CSV

D.Apache Avro

AnswerA

Delta Lake stores data in Parquet files with a transaction log for ACID transactions.

Why this answer

Delta Lake uses Parquet as the base file format, adding a transaction log for ACID properties. Avro and ORC are not used by Delta Lake, and CSV does not support ACID.

Full explanation →

907

MCQmedium

An e-commerce company uses a recommendation model deployed on Vertex AI Endpoints. The model's latency increases gradually over two weeks, causing timeouts. The model is served using a custom container. What is the most likely root cause and corrective action?

A.The model is receiving more traffic; scale the number of replicas.

B.The custom container has a memory leak; implement memory monitoring and set container resource limits.

C.The Vertex AI endpoint has changed its URL; update the client application.

D.The model file size has grown due to feature engineering; reduce feature count.

AnswerB

Memory leaks are a common cause of gradual performance degradation in long-running containers.

Why this answer

A gradual increase in latency over two weeks, without a sudden spike, strongly indicates a memory leak in the custom container. As the leak accumulates, the container's garbage collection becomes less effective, leading to increased GC pauses and eventual timeouts. Setting resource limits and monitoring memory usage can prevent the container from exhausting host memory and causing performance degradation.

Exam trap

The trap here is that candidates confuse a gradual latency increase with a traffic scaling issue (Option A), but the slow, steady degradation over weeks is the hallmark of a resource leak, not a demand spike.

How to eliminate wrong answers

Option A is wrong because a gradual latency increase over two weeks is not characteristic of a sudden traffic surge; traffic spikes would cause immediate latency jumps, not a slow creep. Option C is wrong because Vertex AI endpoint URLs are stable and do not change over time; a URL change would cause immediate 404 errors, not gradual latency increases. Option D is wrong because model file size does not change dynamically during serving; feature engineering changes would require a new model deployment, not cause a gradual latency increase in an already deployed model.

Full explanation →

908

MCQhard

A team is deploying a model that has strict latency requirements: p99 response time under 100 ms. The model is CPU-only and will receive up to 1000 QPS. They want to minimize cost while meeting the SLO. Which machine type and scaling configuration is most appropriate?

A.GPU-enabled machine with min_replicas=1 and max_replicas=2

B.n1-standard-8 with min_replicas=3 and max_replicas=3 (fixed)

C.n1-highmem-2 with min_replicas=2 and max_replicas=10

D.n1-standard-4 with min_replicas=1 and max_replicas=5, CPU utilization target 60%

AnswerD

Correct: n1-standard-4 provides moderate CPU; autoscaling on CPU utilization meets latency and cost goals.

Why this answer

Option D is correct because it uses a CPU-only machine (n1-standard-4) with autoscaling based on CPU utilization target of 60%, which balances cost and performance for a latency-sensitive, CPU-bound inference workload at 1000 QPS. The min_replicas=1 ensures a baseline capacity, while max_replicas=5 allows scaling to handle spikes without over-provisioning, keeping p99 under 100 ms.

Exam trap

Cisco often tests the misconception that GPU machines are always faster for ML inference, but for CPU-only models with strict latency SLOs, a properly scaled CPU instance with autoscaling is more cost-effective and meets performance requirements.

How to eliminate wrong answers

Option A is wrong because GPU-enabled machines are unnecessary and cost-prohibitive for a CPU-only model, and the low max_replicas=2 may not handle 1000 QPS within the latency SLO. Option B is wrong because a fixed 3-replica deployment (n1-standard-8) lacks autoscaling, leading to either over-provisioning (waste) or under-provisioning (SLO violations) under variable load, and the machine type is oversized for the workload. Option C is wrong because n1-highmem-2 is memory-optimized, not compute-optimized, and the wide scaling range (2 to 10) with no utilization target can cause thrashing or high cost without guaranteeing latency.

Full explanation →

909

Matchingmedium

Match each Google Cloud storage option to its best use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Unstructured object storage for any type of data

NoSQL wide-column database for low-latency, high-throughput

Serverless data warehouse for analytics at scale

Relational database for OLTP workloads

NoSQL document database for mobile/web apps

Why these pairings

Correct matches: Cloud Storage for unstructured data, Bigtable for low-latency analytical workloads, BigQuery for serverless data warehousing. Common confusions include mistaking Bigtable for relational databases or BigQuery for transactional systems.

Full explanation →

910

MCQeasy

A team wants to enforce governance and compliance for all ML models across the organisation. They need a centralised repository that tracks model versions, deployment history, and evaluation metrics. Which service should they use?

A.Cloud Storage

B.Vertex AI Feature Store

C.Vertex AI Experiments

D.Vertex AI Model Registry

AnswerD

Model Registry provides centralised model governance with versioning, metrics, and deployment tracking.

Why this answer

Vertex AI Model Registry is the centralised repository for model versioning, lineage, metrics, and deployment management, providing governance and compliance.

Full explanation →

911

Multi-Selecthard

You are fine-tuning a large language model (LLM) from Vertex AI Model Garden using a custom dataset. You need to minimize training cost while maintaining reasonable throughput. Which THREE strategies should you combine?

Select 3 answers

A.Use spot VM instances for training

B.Use parameter-efficient fine-tuning (PEFT) such as LoRA

C.Use full fine-tuning of all model parameters

D.Use TPU v4 pods for training

E.Use mixed precision training (FP16)

AnswersA, B, E

Spot VMs are significantly cheaper than regular VMs and are suitable for fault-tolerant fine-tuning jobs.

Why this answer

Option A is correct because spot VM instances are significantly cheaper than on-demand instances, reducing training cost. They can be preempted, but for fine-tuning tasks that can checkpoint and resume, this trade-off is acceptable for cost savings.

Exam trap

Cisco often tests the misconception that higher-performance hardware (like TPU pods) is always the best choice for cost optimization, when in reality, cost-minimization strategies prioritize cheaper compute and efficient training methods over raw throughput.

Full explanation →

912

Multi-Selecteasy

You are building a machine learning pipeline on Google Cloud. You need to perform feature engineering on large datasets stored in BigQuery and store the resulting features in Vertex AI Feature Store for both online and offline use. Which TWO Google Cloud services should you use?

Select 2 answers

A.Cloud Functions

B.Dataflow

C.BigQuery ML

D.Dataproc

E.Cloud Build

AnswersB, D

Dataflow can process large-scale data and integrate with Feature Store.

Why this answer

Dataflow can read from BigQuery, compute features via Apache Beam, and write to Feature Store. Alternatively, Dataproc can also do this but Dataflow is more serverless. Cloud Functions is not suitable for large-scale.

Cloud Build is for CI/CD. BigQuery ML is for in-database ML.

Full explanation →

913

MCQmedium

You need to run batch predictions on a large dataset stored in BigQuery using a Vertex AI model. The dataset contains 10 million rows, and each prediction takes about 100ms. You want to minimize cost and execution time. What should you do?

A.Export the BigQuery data to CSV in GCS, then run a custom Dataflow pipeline to make predictions.

B.Use Vertex AI batch prediction with BigQuery as the source and sink.

C.Use Vertex AI online prediction and send all rows as separate requests.

D.Use a custom container running on Google Kubernetes Engine to perform inference.

AnswerB

Batch prediction natively supports BigQuery, is cost-effective, and scales automatically.

Why this answer

Vertex AI batch prediction natively supports BigQuery as both input and output, eliminating the need for data export or custom pipelines. For 10 million rows at 100ms each, batch prediction processes them in parallel across multiple machines, minimizing execution time while avoiding the per-node costs of online prediction or the overhead of managing Dataflow or GKE clusters.

Exam trap

Cisco often tests the distinction between batch and online prediction, trapping candidates who overlook that batch prediction is purpose-built for large-scale, offline inference with native BigQuery integration, while online prediction is for real-time, low-latency use cases.

How to eliminate wrong answers

Option A is wrong because exporting to CSV and using Dataflow adds unnecessary complexity and cost; Vertex AI batch prediction can read directly from BigQuery, avoiding data movement and extra processing steps. Option C is wrong because online prediction is designed for low-latency, real-time requests on small payloads, and sending 10 million separate requests would be prohibitively expensive and slow due to per-request pricing and network overhead. Option D is wrong because running a custom container on GKE requires you to manage infrastructure, scaling, and fault tolerance, which is more costly and complex than using Vertex AI's managed batch prediction service.

Full explanation →

914

Multi-Selectmedium

An MLOps engineer is setting up monitoring for a deployed model on Vertex AI Endpoints. Which TWO actions are required to enable Vertex AI Model Monitoring for feature skew and drift? (Choose two.)

Select 2 answers

A.Export ground truth labels to Cloud Storage

B.Enable request/response logging on the Vertex AI Endpoint

C.Enable Vertex AI Pipelines to run scheduled monitoring

D.Create a ModelMonitoringJob with a monitoring configuration

E.Deploy the model with an explanation spec

AnswersB, D

Logging captures the serving data needed for monitoring.

Why this answer

To enable model monitoring, you must enable request/response logging on the endpoint (to capture serving data) and create a monitoring job with the desired configuration.

Full explanation →

915

MCQeasy

A data science team is deploying a large NLP model to Vertex AI for real-time inference. They notice high latency per request. Which action should they take first to reduce latency?

A.Use Cloud Functions for inference.

B.Use model optimization techniques like quantization or pruning.

C.Use Vertex AI Model Optimization to quantize the model and deploy on a smaller machine.

D.Enable autoscaling and set min replicas to 5.

E.Implement batch prediction instead of online prediction.

AnswerC

Quantization reduces model size and latency directly.

Why this answer

Option C is correct because it directly addresses the root cause of high latency in real-time inference: model size and compute requirements. Vertex AI Model Optimization applies quantization or pruning to reduce the model's memory footprint and computational cost, allowing it to run on a smaller, faster machine (e.g., fewer vCPUs or less GPU memory) while maintaining acceptable accuracy. This is the first step recommended by Google Cloud best practices for latency-sensitive deployments, as it reduces per-request processing time without requiring architectural changes.

Exam trap

Google Cloud often tests the misconception that scaling out (autoscaling) or switching to batch processing is the first step to reduce latency, when in fact model optimization and hardware matching are the primary levers for per-request performance in real-time inference.

How to eliminate wrong answers

Option A is wrong because Cloud Functions are stateless, short-lived compute units with a maximum timeout of 9 minutes and limited GPU support, making them unsuitable for hosting large NLP models for real-time inference; they introduce cold-start latency and lack the persistent infrastructure needed for model serving. Option B is wrong because it suggests using model optimization techniques like quantization or pruning but omits the critical step of deploying on a smaller machine; without adjusting the underlying hardware, the latency reduction from optimization alone may be insufficient, and the question asks for the first action to take. Option D is wrong because enabling autoscaling with a minimum of 5 replicas increases resource availability but does not reduce per-request latency; it may even increase cost and complexity without addressing the model's inference speed.

Option E is wrong because batch prediction is designed for asynchronous, high-throughput processing of large datasets, not for real-time inference; it introduces higher latency per request due to queuing and batching overhead, making it counterproductive for reducing latency in a real-time scenario.

Full explanation →

916

MCQhard

A company is using Vertex AI Pipelines with reusable components. They observe that a component that performs hyperparameter tuning is failing intermittently with a 'ResourceExhausted' error. The component is configured with a small custom service account. What is the most likely cause?

A.The component code has a bug causing infinite recursion

B.The KFP executor is not properly configured

C.The service account does not have sufficient quotas or permissions to create the required number of trials or workers

D.The pipeline system memory is insufficient for the component

AnswerC

Hyperparameter tuning often spawns multiple trial jobs; quota limits on AI Platform training jobs or compute resources can cause this error.

Why this answer

The 'ResourceExhausted' error in Vertex AI Pipelines typically indicates that the component is trying to create more resources (e.g., trials or workers for hyperparameter tuning) than allowed by the assigned service account's quotas or permissions. A small custom service account often has restricted quotas for AI Platform services, such as the number of concurrent trials or training workers, leading to this failure.

Exam trap

Google Cloud often tests the misconception that 'ResourceExhausted' errors are always due to memory or code bugs, rather than understanding that Vertex AI enforces service-account-specific quotas for hyperparameter tuning resources.

How to eliminate wrong answers

Option A is wrong because infinite recursion would cause a stack overflow or timeout error, not a 'ResourceExhausted' error specific to resource quotas. Option B is wrong because the KFP executor is a generic pipeline runner; its configuration does not directly affect resource creation quotas for hyperparameter tuning jobs. Option D is wrong because pipeline system memory is a cluster-level resource, not the cause of a 'ResourceExhausted' error tied to service account quotas for creating trials or workers.

Full explanation →

917

MCQmedium

A company uses Vertex AI Pipelines to train and deploy models. They want to automatically generate model documentation that includes model details, intended use, and evaluation results. What should they use?

A.Vertex AI Explanations

B.Vertex AI Metadata

C.Model Cards

D.Vertex AI Model Registry with custom metadata

AnswerC

Model Cards provide automated, standardized documentation.

Why this answer

Model Cards are a standardized format for model documentation, and Vertex AI supports automated generation of model cards.

Full explanation →

918

Multi-Selecteasy

Refer to the exhibit. A data scientist is evaluating a binary classification model trained with BigQuery ML on an imbalanced dataset. The exhibit shows the output of ML.EVALUATE run on two different thresholds. Which TWO actions should the data scientist take to improve model performance? (Choose two.)

Select 2 answers

A.Add more features from the source data.

B.Use AUC-ROC as the evaluation metric instead of accuracy.

C.Apply SMOTE oversampling in the preprocessing pipeline.

D.Use class weights in the CREATE MODEL statement.

E.Increase the number of training iterations.

AnswersB, D

AUC-ROC is robust to class imbalance and provides a better measure of model discrimination.

Why this answer

Option B is correct because AUC-ROC is insensitive to class imbalance and evaluates the model's ability to rank positive instances higher than negative ones across all thresholds, unlike accuracy which can be misleading when the majority class dominates. In BigQuery ML, ML.EVALUATE returns metrics like accuracy, precision, recall, and AUC-ROC; for imbalanced datasets, AUC-ROC provides a more reliable measure of discriminative power.

Exam trap

Google Cloud often tests the misconception that adding more data or features is a universal fix for imbalanced datasets, when in fact the core issue requires adjustments to the evaluation metric or the loss function (e.g., class weights) rather than simply increasing data volume or iterations.

Full explanation →

919

MCQmedium

A company uses Cloud Composer to orchestrate a nightly ML workflow that includes running a Vertex AI pipeline, querying BigQuery, and running a Dataflow job. The Airflow DAG must run only if the previous day's Dataflow job succeeded. Which Airflow concept should they use to implement this dependency?

A.Use a BranchPythonOperator to check the status of the Dataflow job before proceeding.

B.Nest the tasks in a SubDAG with a schedule_interval that starts after the expected Dataflow completion time.

C.Set a TriggerRule on the Vertex AI pipeline task to 'all_done' and reference the previous task.

D.Use the bitshift operators (>>) to set the execution order: Dataflow_task >> VertexAI_pipeline.

AnswerD

The >> operator sets a direct dependency: VertexAI_pipeline runs only after Dataflow_task succeeds.

Why this answer

Option D is correct because Airflow's bitshift operators (>>) define task dependencies in a DAG. By setting `Dataflow_task >> VertexAI_pipeline`, the Vertex AI pipeline task will only execute after the Dataflow task has completed successfully. This directly enforces the required dependency without additional logic or branching.

Exam trap

Cisco often tests whether candidates understand that Airflow's default task dependency behavior (via bitshift operators) inherently enforces success-based execution, making explicit branching or trigger rule modifications unnecessary for simple sequential dependencies.

How to eliminate wrong answers

Option A is wrong because BranchPythonOperator is used for conditional branching within a DAG, not for enforcing a simple sequential dependency; it would unnecessarily complicate the workflow. Option B is wrong because SubDAGs are used for grouping tasks and do not inherently check the success status of external tasks; using a schedule_interval to start after expected completion time does not guarantee the previous day's Dataflow job succeeded. Option C is wrong because setting a TriggerRule to 'all_done' would cause the Vertex AI pipeline to run regardless of the Dataflow task's success (including failure or skipped states), which does not enforce the required success-only dependency.

Full explanation →

920

MCQmedium

A team uses Vertex AI Prediction with a custom container. They want to perform canary deployments by sending 5% of traffic to a new model version. Which method should they use?

A.Create a new endpoint with manual traffic splitting

B.Deploy two separate endpoints and use a load balancer

C.Use Cloud Run for serving with gradual rollout

D.Use the Vertex AI Model Registry and configure traffic splitting on the endpoint

AnswerB

This is correct because you can deploy the new model version to the same endpoint with a small traffic split (e.g., 5%) using the traffic splitting feature.

Why this answer

Option C is correct because Vertex AI endpoints support traffic splitting between deployed models, allowing a controlled canary rollout. Option A is not possible as endpoints cannot have separate traffic splits on different deployments without manual configuration. Option B is incorrect as Model Registry itself does not handle traffic splitting.

Option D uses Cloud Run which is not integrated with Vertex AI Prediction.

Full explanation →

921

MCQmedium

An ML engineer is scaling a prototype to production using Vertex AI Pipelines. The pipeline includes data validation, preprocessing, training, and deployment steps. They want to ensure that the pipeline can be reproduced and audited. What is the best practice?

A.Define the pipeline using Kubeflow Pipelines SDK and run it on Vertex AI Pipelines.

B.Use a Docker container with fixed tags and manually record runs.

C.Store all data and models in a single Cloud Storage bucket with no versioning.

D.Pin all library versions in a requirements.txt file.

AnswerA

Vertex AI Pipelines automatically tracks artifacts, parameters, and lineage.

Why this answer

Using a fully managed pipeline service like Vertex AI Pipelines automatically tracks artifacts, parameters, and lineage, ensuring reproducibility and auditability. Option A is not a service; Option B is about environment consistency but does not provide built-in tracking. Option D is about dependencies but not the pipeline orchestration.

Full explanation →

922

MCQeasy

A data scientist wants to define a lightweight Python function component in Vertex AI Pipelines using Kubeflow Pipelines SDK v2. Which decorator should be applied to the function to make it a pipeline component?

A.@dsl.pipeline

B.@kfp.v2.components.func_to_component

C.@dsl.component

D.@kfp.dsl.component

AnswerC

Correct: @dsl.component turns a Python function into a pipeline component.

Why this answer

In KFP SDK v2, the @dsl.component decorator is used to define a Python function component. @dsl.pipeline is for defining a pipeline that composes multiple components. The other options are not valid decorators.

Full explanation →

923

MCQhard

Refer to the exhibit. A data scientist trained a BigQuery ML classification model to detect fraudulent transactions. The dataset has 95% non-fraud (class 0) and 5% fraud (class 1). The evaluation metrics show high accuracy (0.91) but low recall (0.60) for fraud detection. Which low-code approach should the data scientist take to improve recall without significantly sacrificing precision?

A.Use the ML.PREDICT function with a lower classification threshold (e.g., 0.3 instead of 0.5) to capture more positive cases.

B.Apply feature selection to reduce the number of features and focus on the most predictive ones.

C.Increase the number of training iterations by setting the MAX_ITERATIONS option to a higher value.

D.Re-train the model using AutoML Tables with class weights to penalize false negatives more heavily.

AnswerA

Lowering the threshold increases recall by classifying more instances as positive.

Why this answer

Option A is correct because lowering the classification threshold in ML.PREDICT (e.g., from 0.5 to 0.3) causes the model to classify more transactions as fraud, directly increasing recall. This is a low-code adjustment that does not require retraining or complex feature engineering, and it allows the data scientist to trade off precision for recall as needed.

Exam trap

Google Cloud often tests the misconception that improving recall always requires retraining or complex model changes, when in fact a simple threshold adjustment in ML.PREDICT is a valid low-code technique to shift the precision-recall balance.

How to eliminate wrong answers

Option B is wrong because feature selection reduces the number of input features, which may improve training speed or reduce overfitting but does not directly increase recall for a specific class; it can even harm recall if important fraud-indicative features are removed. Option C is wrong because increasing MAX_ITERATIONS only affects the convergence of the training algorithm; if the model is already converged, more iterations will not improve recall and may lead to overfitting. Option D is wrong because AutoML Tables is a separate service, not a low-code approach within BigQuery ML; while class weights can help, this option requires moving to a different platform and is not the simplest low-code fix described in the question.

Full explanation →

924

MCQhard

A large financial company uses a complex ML pipeline to detect fraudulent transactions. The pipeline consists of multiple steps: data ingestion from Pub/Sub, feature engineering using Dataflow, model training with Vertex AI, and deployment to an endpoint. They currently use Cloud Composer to orchestrate the pipeline with separate DAGs for each step. Recently, they have been experiencing failures in the Dataflow job due to schema changes in the incoming transactions, causing the pipeline to stall. The team manually fixes the schema and re-runs the pipeline, which is time-consuming. They want to improve the robustness of the pipeline. The pipeline is run on a schedule but also triggered by the arrival of new data. The team is considering moving to Vertex AI Pipelines to unify the workflow. They also want to automatically detect schema changes and handle them without manual intervention. Which approach should they take?

A.Keep using Cloud Composer but add retries with exponential backoff to the Dataflow task, and set up a Cloud Monitoring alert to notify the team if the task fails repeatedly

B.Migrate to Vertex AI Pipelines and add a pre-processing step that validates incoming data schema against a schema registry; if schema change is detected, the pipeline sends an alert and uses a default schema to continue processing

C.Use Cloud Scheduler to trigger the pipeline more frequently to reduce the impact of failures

D.Create a separate Dataflow pipeline to handle schema detection and run it before the main pipeline; if schema changes, send an email to the team

AnswerB

This provides automated handling of schema changes.

Why this answer

Option B is correct because it directly addresses the need for automated schema change detection and handling within a unified orchestration framework. By migrating to Vertex AI Pipelines, the team gains a managed, end-to-end ML workflow service that can include a pre-processing step to validate incoming data against a schema registry. When a schema change is detected, the pipeline can automatically apply a default schema and continue, eliminating manual intervention and reducing downtime.

Exam trap

The trap here is that candidates often think retries or alerts (Option A) are sufficient for handling failures, but the question explicitly requires automatic handling without manual intervention, which only a schema validation and fallback step can provide.

How to eliminate wrong answers

Option A is wrong because adding retries with exponential backoff does not solve the root cause of schema changes; it only retries the same failing operation, which will continue to fail until the schema is manually fixed, and Cloud Monitoring alerts still require manual intervention. Option C is wrong because increasing the frequency of pipeline runs via Cloud Scheduler does not address schema change failures; it would only cause more frequent failures and waste resources. Option D is wrong because creating a separate Dataflow pipeline for schema detection still requires manual email notification and manual re-run, and it does not integrate automated handling or a unified workflow like Vertex AI Pipelines provides.

Full explanation →

925

MCQmedium

You have a Vertex AI endpoint serving a model with min replicas=2 and max replicas=10. You notice that during low traffic hours, the endpoint still runs 2 replicas, incurring costs. You want to reduce costs to zero when there is no traffic. What should you do?

A.Change min replicas to 0 and max replicas to 10.

B.Use a custom metric to trigger scaling down to zero.

C.Delete the endpoint when not in use and recreate it on demand.

D.Set max replicas to 0.

AnswerA

This enables scale-to-zero, allowing endpoint to scale down to zero when idle.

Why this answer

Setting min replicas to 0 allows Vertex AI to scale down to zero instances when there is no traffic, eliminating costs during idle periods. The endpoint will automatically scale up from 0 to handle incoming requests, while max replicas=10 ensures it can handle peak load. This is the standard approach for cost optimization in Vertex AI endpoints.

Exam trap

The trap here is that candidates assume min replicas must be at least 1 for the endpoint to be available, but Vertex AI supports scale-to-zero with min replicas=0, which is the correct way to eliminate idle costs.

How to eliminate wrong answers

Option B is wrong because custom metrics can trigger scaling but cannot override the min replicas constraint; with min replicas=2, the endpoint will never scale below 2 replicas regardless of the metric. Option C is wrong because deleting and recreating the endpoint on demand is impractical, introduces latency for cold starts, and violates best practices for production serving; Vertex AI endpoints are designed to be persistent. Option D is wrong because setting max replicas to 0 would prevent the endpoint from serving any traffic, effectively breaking the service, not just reducing costs.

Full explanation →

926

MCQhard

An ML team is fine-tuning a large language model using a custom container on Vertex AI. They want to reduce costs by using preemptible (spot) VMs for training. The training job is long-running and uses checkpointing. Which statement is correct regarding spot VM usage?

A.Spot VMs are not available for custom training jobs on Vertex AI

B.Training will automatically resume from the latest checkpoint without any configuration

C.You must enable checkpointing in the training code and use spot VMs by setting the 'spot' field in the machine spec

D.Spot VMs cannot be used with GPU accelerators

AnswerC

This is correct: the code must checkpoint, and the machine spec must indicate spot=true.

Why this answer

Option C is correct because Vertex AI custom training jobs support spot VMs, but you must explicitly enable checkpointing in your training code and set the 'spot' field in the machine spec to true. This ensures that when a preemptible VM is terminated, the training can resume from the latest checkpoint, preventing loss of progress and reducing costs.

Exam trap

The trap here is that candidates assume Vertex AI automatically handles checkpointing and resumption for spot VMs, but in reality, you must explicitly implement both the checkpointing logic and the spot VM configuration.

How to eliminate wrong answers

Option A is wrong because Vertex AI does support spot VMs for custom training jobs, as long as you configure them correctly. Option B is wrong because training does not automatically resume from the latest checkpoint; you must implement checkpointing logic in your training code and configure the job to use spot VMs. Option D is wrong because spot VMs can be used with GPU accelerators on Vertex AI, though you must be aware that preemption may occur more frequently with GPUs.

Full explanation →

927

Multi-Selecteasy

A company wants to use pre-built Google Cloud APIs for text analysis. Which TWO APIs can they use? (Choose TWO.)

Select 2 answers

A.Cloud Natural Language API

B.Cloud Translation API

C.Cloud Vision API

D.Video Intelligence API

E.Document AI

AnswersA, B

For text analysis.

Why this answer

The Cloud Natural Language API provides pre-built machine learning models for text analysis tasks such as entity recognition, sentiment analysis, and syntax analysis. The Cloud Translation API can translate text between languages, which is a form of text analysis. Both are pre-built Google Cloud APIs that directly address the company's need for text analysis without requiring custom model training.

Exam trap

The trap here is that candidates may confuse Document AI with a general text analysis API, but Document AI is specifically for document parsing and OCR, not for core NLP tasks like sentiment or entity extraction, which are the focus of the Cloud Natural Language API.

Full explanation →

928

Multi-Selecthard

A company is experiencing high prediction costs on Vertex AI Endpoints. They want to monitor and optimize costs. Which THREE actions should they take? (Choose 3)

Select 3 answers

A.Use Cloud Billing reports to track Vertex AI endpoint costs per hour and per request

B.Use Vertex AI Explainability on every prediction

C.Reduce the number of replicas or use autoscaling to minimize idle resources

D.Set up budget alerts in Google Cloud Billing to notify when costs exceed a threshold

E.Enable Vertex AI Model Monitoring to track prediction latency

AnswersA, C, D

Cloud Billing provides cost breakdowns by service and resource.

Why this answer

Cost monitoring involves tracking per-hour and per-request costs, setting budget alerts, and possibly adjusting scaling to reduce unnecessary compute.

Full explanation →

929

MCQhard

An MLOps engineer is configuring Vertex AI Model Monitoring for a deployed model. They want to monitor for feature skew between training and serving data, but only for a subset of features. The training data has 100 features, and they want to monitor only the top 10 most important features to reduce cost and noise. How can they achieve this?

A.Set the 'monitoring_interval' to a low value so that only frequent features are monitored

B.Train a new model with only the top 10 features and redeploy it

C.Use the 'feature_names' parameter in the ModelMonitoringObjectConfig to specify which features to monitor

D.Set the 'sampling_rate' to 100% and ignore the rest

AnswerC

The feature_names parameter allows you to select a subset of features for monitoring.

Why this answer

Vertex AI Model Monitoring allows you to specify a list of feature names to monitor via the 'feature_names' attribute in the monitoring configuration. This can be set when creating the monitoring job, targeting only the features of interest.

Full explanation →

930

MCQhard

A healthcare startup is building a diagnostic tool that uses a deep learning model to classify medical images. The model is trained on TensorFlow and deployed on Vertex AI Prediction. The startup has strict latency requirements: predictions must return within 200 ms for 95% of requests. Current performance shows p95 latency of 350 ms. The team has already tried using a smaller model, but accuracy dropped below acceptable levels. The traffic pattern is spiky: low load during nights but bursts of 1000 requests per second during business hours. Currently, they use a single n1-highmem-8 VM with a GPU attached. They have a budget for additional resources but need to optimize cost. The model is about 500 MB and requires GPU for inference. Which course of action should they take to meet the latency requirement while managing costs?

A.Upgrade to an n1-highmem-16 VM with a more powerful GPU

B.Switch to batch prediction using Vertex AI Batch Prediction and store results in a database for retrieval

C.Create a Vertex AI Prediction endpoint with an accelerator (GPU) and enable autoscaling (min 1, max 5 nodes)

D.Deploy the model as a Cloud Function using TensorFlow Serving

AnswerC

Autoscaling with GPU provides low latency during bursts and cost efficiency by scaling down during low load.

Why this answer

Option C is correct because it leverages Vertex AI Prediction's autoscaling to handle spiky traffic efficiently, using GPU-accelerated endpoints that can scale from 1 to 5 nodes to meet the 200 ms p95 latency requirement. This approach minimizes cost during low-load periods while providing burst capacity for the 1000 requests per second peak, addressing both the latency and budget constraints without compromising model accuracy.

Exam trap

The trap here is that candidates often choose a single-node upgrade (Option A) thinking more power solves latency, but they overlook the need for horizontal scaling to handle spiky traffic, while Option B seems cost-effective but ignores the real-time requirement, and Option D appears serverless but fails due to GPU and timeout limitations.

How to eliminate wrong answers

Option A is wrong because upgrading to a more powerful VM (n1-highmem-16 with a better GPU) does not solve the spiky traffic pattern; it increases cost during low-load periods and still risks latency spikes during bursts due to a single-node bottleneck. Option B is wrong because batch prediction is asynchronous and not suitable for real-time diagnostic tools requiring sub-200 ms responses; storing results in a database for retrieval introduces additional latency and cannot meet the strict p95 latency requirement. Option D is wrong because Cloud Functions have a maximum timeout of 540 seconds and do not natively support GPU acceleration, making them incapable of running a 500 MB deep learning model with GPU inference within the latency constraint.

Full explanation →

931

Multi-Selectmedium

A company needs to build a custom model to classify images of products into categories. They have a large labeled dataset. They want to use AutoML but are unsure which options support image classification. Which TWO AutoML products support image classification?

Select 1 answer

A.AutoML Natural Language

B.AutoML Video

C.AutoML Vision

D.AutoML Translation

E.AutoML Tables

AnswersC

Why this answer

AutoML Vision (Option C) is the correct choice because it is specifically designed for image classification tasks, allowing users to train custom models on labeled image datasets without writing code. It supports features like object detection and image segmentation, making it ideal for the described use case.

Exam trap

Cisco often tests the distinction between AutoML services by focusing on the data modality (text, image, video, tabular), leading candidates to mistakenly choose AutoML Video for image tasks due to its visual nature, but it is strictly for video sequences, not static images.

Full explanation →

932

MCQmedium

A machine learning engineer is training a TensorFlow model on Vertex AI using distributed training with the MultiWorkerMirroredStrategy. The training job uses 4 workers with 4 GPUs each. The engineer notices that the training is not scaling linearly. What is the most likely cause?

A.The model architecture is too simple to benefit from distribution

B.The workers are not using the same version of TensorFlow

C.Communication overhead due to gradient synchronization

D.The GPUs are not configured correctly

AnswerC

MultiWorkerMirroredStrategy synchronizes gradients across workers; network latency can limit scaling.

Why this answer

With MultiWorkerMirroredStrategy, each worker computes gradients independently on its local batch, then all-reduces gradients across workers via collective communication (e.g., NCCL or gRPC). As the number of workers increases, the communication overhead for gradient synchronization grows, often dominating the per-step time and preventing linear scaling. This is the most common bottleneck in distributed TensorFlow training, especially with many workers or small batch sizes per worker.

Exam trap

The trap here is that candidates often assume more workers always means linear speedup, ignoring the fixed overhead of gradient synchronization that becomes the dominant factor in distributed training.

How to eliminate wrong answers

Option A is wrong because even a simple model can suffer from communication overhead if the compute-to-communication ratio is low; the issue is not model simplicity but the cost of synchronizing gradients across workers. Option B is wrong because TensorFlow enforces version consistency across workers in a distributed job; mismatched versions would cause a job failure, not sublinear scaling. Option D is wrong because GPU misconfiguration (e.g., incorrect driver or CUDA version) would typically cause errors or zero utilization, not gradual scaling degradation; the observed symptom of sublinear scaling points to communication, not hardware misconfiguration.

Full explanation →

933

MCQhard

A team is using Vertex AI Pipelines to deploy a model. They have a component that evaluates the model and produces a ClassificationMetrics artifact. The pipeline should deploy the model only if the precision is greater than 0.9. They use dsl.If to check the metric. However, the condition always evaluates to False. What is the most likely cause?

A.The precision value is stored as a float but the condition expects a string.

B.The evaluation component did not output the metric correctly.

C.The ClassificationMetrics artifact is not accessible in the condition context.

D.The dsl.If block is placed incorrectly in the pipeline definition.

AnswerC

Correct: Conditions cannot directly read artifact properties; the metric value must be extracted as a pipeline parameter before the condition.

Why this answer

Option C is correct because in Vertex AI Pipelines, `ClassificationMetrics` artifacts are not directly accessible as primitive values within the `dsl.If` condition context. The `dsl.If` condition can only evaluate pipeline parameters or primitive outputs (like strings, integers, floats) that are explicitly passed as pipeline-level parameters or task outputs. A `ClassificationMetrics` artifact is a complex object that must be parsed or have its specific metric values extracted (e.g., via a custom component or `dsl.Metrics`) before they can be used in a conditional check.

Exam trap

Cisco often tests the distinction between artifact types and primitive outputs in pipeline conditions, trapping candidates who assume that any output from a component can be directly used in a conditional statement without considering the data type or how the pipeline runtime resolves it.

How to eliminate wrong answers

Option A is wrong because the condition in `dsl.If` can compare floats directly; the precision value being a float does not cause the condition to always evaluate to False. Option B is wrong because the question states the evaluation component produces a `ClassificationMetrics` artifact, implying the output is correct; the issue is not with the component's output but with how that output is accessed in the condition. Option D is wrong because the placement of the `dsl.If` block in the pipeline definition does not affect its ability to evaluate conditions; the condition fails due to the data type of the input, not its position.

Full explanation →

934

MCQhard

A research team is training a very large Transformer model that does not fit into the memory of a single GPU. They have access to multiple GPUs on a single machine and want to split the model layers across GPUs. Which distributed training strategy should they use?

A.MultiWorkerMirroredStrategy

B.Parameter server strategy

C.MirroredStrategy (data parallelism)

D.Pipeline parallelism (model parallelism)

AnswerD

Pipeline parallelism splits the model across devices, allowing large models to be trained.

Why this answer

When a model is too large for one GPU, model parallelism (pipeline parallelism) is required. This splits different layers (or layer groups) across devices. Data parallelism (mirrored strategy) replicates the model, which would still require the full model on each GPU.

Pipeline parallelism is a form of model parallelism where layers are distributed across devices and micro-batches flow through the pipeline.

Full explanation →

935

MCQmedium

A company uses Vertex AI Pipelines to train models on a daily schedule. The pipeline includes a component that runs a BigQuery query to extract features. The team wants to ensure that if the BigQuery component fails due to transient network errors, the pipeline automatically retries it. How can they configure retries in Vertex AI Pipelines?

A.Deploy the component as a Cloud Function and configure Cloud Functions retry.

B.Wrap the component in a `dsl.If` conditional that checks for failure and re-submits the component.

C.Use Cloud Composer with a task retry policy in Airflow.

D.Set the `retry` parameter of the component to a positive integer, for example `retry=3`.

AnswerD

The `retry` parameter in the component decorator or constructor enables automatic retries.

Why this answer

Option D is correct because Vertex AI Pipelines natively supports a `retry` parameter on pipeline components. Setting `retry=3` instructs the pipeline to automatically retry the component up to three times if it fails due to transient errors, such as network timeouts. This is the simplest and most direct way to handle retries within the Vertex AI Pipelines orchestration framework.

Exam trap

The trap here is that candidates may confuse Vertex AI Pipelines' native `retry` parameter with external retry mechanisms (Cloud Functions, Airflow) or misuse pipeline control flow constructs like `dsl.If` for retry logic, when the correct approach is a simple parameter on the component definition.

How to eliminate wrong answers

Option A is wrong because deploying the component as a Cloud Function and configuring Cloud Functions retry would move the execution outside of Vertex AI Pipelines, breaking the pipeline's orchestration and monitoring. Option B is wrong because `dsl.If` conditionals are used for conditional execution of components, not for retrying a failed component; they cannot re-submit a component that has already failed. Option C is wrong because Cloud Composer with Airflow is a separate orchestration service that would require migrating the entire pipeline out of Vertex AI Pipelines, adding unnecessary complexity and cost.

Full explanation →

936

Multi-Selectmedium

Which TWO of the following are benefits of using BigQuery ML for low-code model development?

Select 2 answers

A.Train models directly on data in BigQuery without moving it

B.Automatic feature engineering and hyperparameter tuning

C.Automatic scaling to petabytes of data

D.Built-in model explainability for all model types

E.Support for image classification tasks

AnswersA, C

Data stays in BigQuery, eliminating ETL.

Why this answer

Option A is correct because BigQuery ML allows you to train machine learning models using SQL directly on data stored in BigQuery, eliminating the need to export or move data to a separate environment. This reduces data transfer latency, simplifies security governance, and leverages BigQuery's native storage and compute separation.

Exam trap

Google Cloud often tests the misconception that 'low-code' means 'fully automated' — candidates mistakenly assume BigQuery ML handles feature engineering and hyperparameter tuning automatically, when in fact it only reduces coding effort for model creation, not for data preparation or optimization.

Full explanation →

937

MCQhard

An ML team wants to deploy multiple models (e.g., a recommender and a classifier) behind a single Vertex AI endpoint. The models have different resource requirements: the recommender needs GPU, the classifier needs high memory. How should they configure the endpoint?

A.Use Cloud Run for one model and Vertex AI for the other.

B.Use a single machine type that meets the highest requirements.

C.Deploy both models to the same endpoint with different machine types per deployed model.

D.Create separate endpoints for each model.

AnswerC

Vertex AI supports deploying multiple models with independent machine specifications.

Why this answer

Vertex AI allows deploying multiple models on the same endpoint, each with its own machine type and resources. Traffic splitting routes requests to the correct model.

Full explanation →

938

MCQeasy

Which API is recommended for high-throughput, low-latency online prediction requests to Vertex AI endpoints?

A.Cloud Functions

B.REST API

C.Cloud Pub/Sub

D.gRPC API

AnswerD

gRPC provides better performance for online prediction due to binary serialization and streaming.

Why this answer

gRPC API is recommended for high-throughput, low-latency online prediction requests to Vertex AI endpoints because it uses HTTP/2 for multiplexed streaming, binary serialization (Protocol Buffers), and supports bidirectional streaming, which reduces latency and improves throughput compared to REST. Vertex AI's prediction service natively supports gRPC for real-time inference, making it the optimal choice for latency-sensitive applications.

Exam trap

Cisco often tests the misconception that REST API is the default or only way to interact with cloud services, but the trap here is that for high-throughput, low-latency online predictions, gRPC is explicitly recommended over REST due to its performance advantages with Protocol Buffers and HTTP/2.

How to eliminate wrong answers

Option A is wrong because Cloud Functions is a serverless compute service for event-driven code, not an API for making prediction requests; it can invoke Vertex AI endpoints via REST or gRPC but is not itself an API protocol. Option B is wrong because REST API uses HTTP/1.1 with JSON serialization, which introduces higher latency and larger payload sizes compared to gRPC's binary Protocol Buffers, making it suboptimal for high-throughput, low-latency scenarios. Option C is wrong because Cloud Pub/Sub is a message queue for asynchronous, decoupled messaging, not designed for synchronous, low-latency online predictions; it adds queuing delay and is intended for batch or event-driven workflows.

Full explanation →

939

Multi-Selectmedium

A model serving team is experiencing high latency in production. Which TWO actions should they take to diagnose the root cause? (Choose 2.)

Select 2 answers

A.Convert the model to a different framework that is faster.

B.Enable Cloud Trace to analyze request latency across services.

C.Check the endpoint's autoscaling metrics and cold start frequency.

D.Increase the number of replicas to reduce load per replica.

E.Set the logging verbosity to DEBUG in the container.

AnswersB, C

Cloud Trace provides detailed latency breakdowns.

Why this answer

Options A and D are correct. Option B is wrong because increasing replicas may mask the issue but not diagnose. Option C is wrong because converting framework may not address latency.

Option E is wrong because log level changes do not provide granular latency analysis.

Full explanation →

940

Multi-Selectmedium

You are designing a distributed training job for a PyTorch model on Vertex AI using multiple machines with GPUs. Which TWO configurations are required to enable data parallelism with PyTorch DDP? (Choose 2.)

Select 2 answers

A.Use the command 'torch.distributed.launch' to start each worker.

B.Set environment variables MASTER_ADDR, MASTER_PORT, WORLD_SIZE, and RANK in each container.

C.Enable Vertex Explainable AI during training.

D.Set environment variable TF_CONFIG for each replica.

E.Specify a custom service account with access to Cloud TPU.

AnswersA, B

torch.distributed.launch (or torchrun) handles spawning processes with correct environment variables.

Why this answer

PyTorch DDP requires the master address and port (MASTER_ADDR, MASTER_PORT) for the communication group, and WORLD_SIZE and RANK. Vertex AI sets TF_CONFIG for TensorFlow, not PyTorch. NCCL is the backend.

Full explanation →

941

Multi-Selecthard

A pipeline includes a component that produces a model artifact. The team wants to automatically detect skew between the training data distribution and the serving data distribution. Which three best practices should they implement? (Choose three.)

Select 3 answers

A.Compare statistics using a dedicated component and alert on threshold exceedance

B.Use in-memory data passing for efficiency

C.Compute serving data statistics using a component

D.Disable caching to ensure fresh statistics

E.Pass training data statistics as a Dataset artifact

AnswersA, C, E

A comparison component can detect skew and trigger alerts.

Why this answer

To detect skew, one should pass training and serving data statistics as artifacts, compare them using a statistics comparison component, and set up an alert if skew exceeds a threshold. Using GCS URIs for passing data is a general best practice for idempotency.

Full explanation →

942

MCQeasy

A company wants to log all prediction requests and responses from a Vertex AI Endpoint to BigQuery for auditing and debugging. How can they achieve this?

A.Export endpoint logs from Cloud Logging to Cloud Storage and then load into BigQuery manually.

B.Use a Cloud Function to intercept predictions and write to BigQuery.

C.Vertex AI endpoints do not support request/response logging.

D.Enable request/response logging on the endpoint and create a BigQuery sink for the log.

AnswerD

Correct: endpoint logging captures data, and a sink routes to BigQuery.

Why this answer

Vertex AI endpoints can be configured to enable request/response logging. The logs can be sent to a BigQuery table via a log sink.

Full explanation →

943

MCQeasy

An ML engineer has deployed a model on Vertex AI Endpoints and wants to detect when the serving data distribution differs from the training data distribution. Which monitoring feature should they enable?

A.Prediction drift monitoring

B.Feature drift monitoring

C.Model quality monitoring

D.Feature skew monitoring

AnswerD

Correct: Feature skew compares training vs serving distributions.

Why this answer

Feature skew monitoring compares the training data distribution (stored in a baseline) with the serving data distribution to detect skew. Feature drift tracks changes over time in serving data only.

Full explanation →

944

MCQmedium

You have a Vertex AI pipeline that trains a model and outputs a Model artifact. You want to register this model in the Vertex AI Model Registry. Which pre-built Google Cloud Pipeline Components component should you use?

A.VertexEndpointDeployOp

B.CreateModelVersionsOp

C.ModelRegisterOp

D.VertexModelUploadOp

AnswerD

This component uploads a model to the Vertex AI Model Registry.

Why this answer

The correct component is VertexModelUploadOp because it is specifically designed to upload a trained model artifact to the Vertex AI Model Registry, creating a new model version or a new model if one does not exist. This component takes the model artifact from a pipeline step and registers it, making it available for deployment or version management.

Exam trap

Cisco often tests the distinction between model registration and deployment, so candidates mistakenly choose VertexEndpointDeployOp thinking it registers the model, when in fact it only deploys an already-registered model to an endpoint.

How to eliminate wrong answers

Option A is wrong because VertexEndpointDeployOp is used to deploy a model to an endpoint, not to register a model in the Model Registry. Option B is wrong because CreateModelVersionsOp is not a pre-built Google Cloud Pipeline Components component; the correct component for creating model versions is VertexModelUploadOp. Option C is wrong because ModelRegisterOp does not exist as a pre-built component in the Google Cloud Pipeline Components suite.

Full explanation →

945

MCQeasy

A machine learning engineer needs to schedule a Vertex AI pipeline to run daily at midnight. Which approach should they use?

A.Use the Vertex AI Pipelines console to set a cron schedule directly on the pipeline.

B.Create a Cloud Build trigger that runs the pipeline on a schedule.

C.Use Cloud Tasks to create a recurring task that invokes the pipeline.

D.Create a Cloud Function that calls the Vertex AI API, triggered by a Pub/Sub message from Cloud Scheduler.

AnswerD

This is the recommended pattern: Cloud Scheduler publishes to Pub/Sub, which triggers a Cloud Function that starts the pipeline.

Why this answer

Option D is correct because Vertex AI Pipelines does not natively support cron scheduling. The recommended pattern is to use Cloud Scheduler to publish a message to a Pub/Sub topic at the desired time, which then triggers a Cloud Function that calls the Vertex AI API to create and run the pipeline. This decoupled architecture ensures reliable scheduling and allows for custom logic before invocation.

Exam trap

The trap here is that candidates assume Vertex AI Pipelines has built-in scheduling like Airflow, but the exam tests knowledge of the correct Google Cloud integration pattern using Cloud Scheduler, Pub/Sub, and Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines console does not provide a direct cron scheduling interface; you must use an external scheduler. Option B is wrong because Cloud Build triggers are designed for CI/CD events (e.g., code pushes) and are not intended for recurring pipeline execution; they lack the precise time-based scheduling needed for daily runs. Option C is wrong because Cloud Tasks is built for single or delayed task execution, not recurring schedules; it would require additional orchestration to mimic a cron job, making it less suitable than Cloud Scheduler.

Full explanation →

946

MCQmedium

A model deployed on Vertex AI Prediction is returning high latency for real-time requests. The model is a small TensorFlow model. Which troubleshooting step should the team take first?

A.Retrain the model with a larger batch size

B.Check if the machine type is too small and enable autoscaling

C.Use a custom container with optimized runtime

D.Enable Cloud Armor to reduce traffic

AnswerB

Low latency often requires adequate resources.

Why this answer

Option B is correct because high latency for real-time predictions from a small TensorFlow model often indicates that the serving infrastructure is under-provisioned. Checking the machine type and enabling autoscaling directly addresses whether the instance is too small to handle the request volume, which is the most common first step in diagnosing latency issues on Vertex AI Prediction.

Exam trap

Google Cloud often tests the principle of 'start with the simplest infrastructure fix before optimizing the model or container,' so candidates mistakenly jump to retraining or custom containers without first checking if the instance type and scaling settings are appropriate.

How to eliminate wrong answers

Option A is wrong because retraining with a larger batch size affects training throughput, not inference latency for real-time requests; inference batch size is set at serving time, not during training. Option C is wrong because using a custom container with an optimized runtime is a more advanced optimization step that should be considered only after verifying that the base infrastructure (machine type and scaling) is adequate. Option D is wrong because Cloud Armor is a security service for DDoS protection and traffic filtering, not a tool for reducing latency caused by insufficient compute resources.

Full explanation →

947

MCQmedium

A financial services company uses a custom container to serve a fraud detection model on Vertex AI Endpoints. The model requires a feature store lookup for each prediction. Recently, the feature store (Cloud Bigtable) experienced a brief outage, causing some predictions to fail. After the outage resolved, the endpoint's CPU utilization dropped significantly, and prediction latency improved. However, the model's false positive rate increased sharply. The ML engineer suspects the model is using stale features because the feature store outage caused missing lookups. Cloud Monitoring for the endpoint shows no errors after the outage, but the number of feature store read requests per prediction decreased by 30%. Which metric should the engineer use to confirm the hypothesis of stale features?

A.Monitor the prediction request latency to see if it remains low.

B.Use Vertex AI Model Monitoring to compare the prediction distribution before and after the outage; significant drift indicates stale features.

C.Verify the feature store's read throughput and latency metrics to ensure it is healthy.

D.Check the error rate for the endpoint; if no errors, then features were retrieved correctly.

AnswerB

Drift detection directly reveals changes in model behavior due to input changes.

Why this answer

Option B is correct because Vertex AI Model Monitoring can detect prediction distribution drift, which directly indicates that the model is receiving different input features than expected. A significant drift after the outage, combined with the 30% drop in feature store read requests, confirms that stale or default features were substituted for missing lookups, causing the false positive rate to spike.

Exam trap

The trap here is that candidates assume no errors means no problem, but the question explicitly describes a silent failure where the model uses stale features without raising any error, so metrics like latency or error rate are irrelevant for detecting feature staleness.

How to eliminate wrong answers

Option A is wrong because low prediction latency does not confirm stale features; it only indicates that the endpoint is processing requests faster, which could be due to fewer feature store reads (as observed) but does not prove that the features used are stale. Option C is wrong because verifying the feature store's health metrics (read throughput, latency) only confirms that Bigtable is operational now, not whether the model used stale features during the outage or after. Option D is wrong because the absence of endpoint errors does not guarantee correct feature retrieval; the model can silently use default or cached values without raising errors, which is exactly what happened here.

Full explanation →

948

MCQhard

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

A.The model is stored in a regional bucket and the Vertex AI endpoint is in a different region.

B.The machine type does not support TensorFlow models larger than 1 GB.

C.The model is too large for the machine's memory, causing an out-of-memory (OOM) error during loading.

D.The model file is corrupted or missing dependencies, causing a crash.

AnswerC

The 2 GB model may require more than 15 GB RAM during loading due to overhead and intermediate structures.

Why this answer

Option C is correct because the model artifact is 2 GB, and loading it into memory on an n1-standard-4 machine (15 GB RAM) can still cause a MemoryError. TensorFlow models often require additional memory for graph construction, intermediate tensors, and framework overhead, which can easily exceed the available RAM, especially when the model is loaded entirely into memory before serving.

Exam trap

Google Cloud often tests the misconception that model file size must be less than total machine RAM to avoid OOM errors, but the trap here is that TensorFlow's memory footprint during loading and serving is significantly larger than the artifact size due to framework overhead and graph construction.

How to eliminate wrong answers

Option A is wrong because a regional bucket mismatch would cause a permission or access error, not a MemoryError; Vertex AI can access models from any regional bucket as long as the service account has proper permissions. Option B is wrong because there is no inherent machine type limitation that restricts TensorFlow models to 1 GB; the n1-standard-4 can handle larger models if sufficient memory is available. Option D is wrong because a corrupted file or missing dependencies would typically result in an ImportError or a crash with a different error message, not a MemoryError.

Full explanation →

949

Multi-Selectmedium

A data science team uses Vertex AI Model Monitoring to detect data quality issues in a production model. Which TWO metrics should they enable to identify problems with missing values in predictions? (Select TWO.)

Select 2 answers

A.Feature value distribution skew (distance metrics).

B.Training-serving skew detection for all features.

C.Total count of missing values across all features.

D.Prediction confidence score.

E.Missing value ratio per feature.

AnswersA, E

Can detect shifts due to missing values being treated differently.

Why this answer

Option A is correct because Vertex AI Model Monitoring's feature value distribution skew detection uses distance metrics (e.g., Jenson-Shannon divergence, L-infinity) to compare the distribution of feature values in the serving data against the training data. A sudden increase in missing values in a feature will shift its distribution, triggering a skew alert. This allows the team to detect missing value problems indirectly by monitoring distributional drift.

Exam trap

Google Cloud often tests the distinction between aggregate metrics (like total count) and per-feature metrics (like ratio), and candidates mistakenly select 'total count of missing values across all features' because they think it directly addresses missing values, but Vertex AI Model Monitoring only supports per-feature missing value ratios.

Full explanation →

950

MCQmedium

An ML engineer notices that predictions are taking longer than expected under moderate traffic. Reviewing the endpoint configuration, what is the most likely cause of the high latency?

A.Container logging is disabled, slowing down request processing.

B.The accelerator count is 0, meaning no GPU is used.

C.The machine type n1-standard-4 is underpowered for the model's compute needs.

D.Automatic scaling is set with a maxReplicaCount of 10, which creates overhead.

AnswerB

BERT models are computationally intensive and benefit greatly from GPU acceleration; without it, inference is CPU-bound and slow.

Why this answer

When the accelerator count is set to 0, the endpoint runs inference on the CPU only, which is significantly slower than GPU-accelerated inference for deep learning models. This is the most direct cause of high latency under moderate traffic, as the model's compute demands exceed CPU throughput.

Exam trap

Google Cloud often tests the misconception that CPU machine type is the primary cause of latency, when in fact the accelerator count being zero is the more direct and common misconfiguration for deep learning models.

How to eliminate wrong answers

Option A is wrong because disabling container logging reduces I/O overhead and actually speeds up request processing, not slows it down. Option C is wrong because n1-standard-4 (4 vCPUs, 15 GB RAM) is a standard compute-optimized machine type that is generally sufficient for moderate traffic; the primary bottleneck is the lack of GPU acceleration, not CPU underpowering. Option D is wrong because a maxReplicaCount of 10 does not create overhead; automatic scaling with a higher maxReplicaCount allows more instances to handle load, reducing latency under traffic.

Full explanation →

951

MCQhard

An ML pipeline runs on Vertex AI and includes a component that uses a third-party library not available in the default Python environment. The team wants to avoid building a custom container image. Which approach should they use?

A.Install the library using pip in the pipeline definition

B.Use a container component with a pre-built image

C.Use the packages_to_install parameter in @dsl.component

D.Add the library to the Vertex AI custom training image

AnswerC

This parameter allows specifying extra packages to install in the component's execution environment.

Why this answer

Option C is correct because the `packages_to_install` parameter in the `@dsl.component` decorator allows you to specify a list of third-party Python packages (e.g., via pip) that will be installed at runtime in the component's execution environment, without needing to build a custom container image. This is the recommended approach in Vertex AI Pipelines when you need to use a library not present in the default Python environment, as it avoids the overhead of custom container creation while ensuring the dependency is available for that specific component.

Exam trap

The trap here is that candidates often confuse the `packages_to_install` parameter with a generic pip install command in the pipeline definition, or they assume that using a pre-built container image does not count as building a custom container, when in fact any container image that includes the library must be custom-built or selected from a registry, which still requires image management overhead.

How to eliminate wrong answers

Option A is wrong because `pip install` in the pipeline definition (e.g., in a Python function or YAML) is not a supported mechanism in Vertex AI Pipelines; the pipeline definition itself does not execute shell commands, and dependencies must be declared via the component decorator. Option B is wrong because using a container component with a pre-built image still requires building a custom container image (even if it's pre-built, you must create or select one that includes the library), which contradicts the requirement to avoid building a custom container image. Option D is wrong because adding the library to the Vertex AI custom training image involves creating a custom container image for training, which is a separate process from pipeline components and also requires building a custom image, violating the constraint.

Full explanation →

952

MCQeasy

A company needs to extract text from scanned invoices and parse key fields like invoice number and total amount. Which Document AI processor should they use?

A.OCR Processor

B.Contract Parser

C.Form Parser

D.Invoice Parser

AnswerD

Why this answer

The Invoice Parser is specialised for parsing invoice documents. OCR Processor extracts text only, Form Parser extracts form fields, and Contract Parser is for legal contracts.

Full explanation →

953

MCQhard

You have a Vertex AI endpoint serving a model for real-time predictions. The endpoint is configured with minReplicaCount=2 and maxReplicaCount=10. Over the past week, you notice that the actual number of replicas rarely exceeds 2, but the average CPU utilization is around 85%. You want to reduce costs without impacting performance. What should you do?

A.Increase minReplicaCount to 5.

B.Decrease minReplicaCount to 1.

C.Increase maxReplicaCount to 20.

D.Decrease the CPU utilization target to 50%

AnswerB

Since the number of replicas rarely exceeds 2, lowering min to 1 reduces the baseline cost, and the autoscaler can still scale up if needed.

Why this answer

Option B is correct because decreasing minReplicaCount to 1 allows the endpoint to scale down to a single replica when traffic is low, reducing compute costs. Since the actual replica count rarely exceeds 2, the current minReplicaCount=2 forces at least two replicas to run continuously, even when one would suffice. With average CPU utilization at 85%, the model is already efficiently handling load, so scaling down to one replica will not impact performance while saving costs.

Exam trap

The trap here is that candidates often assume increasing minReplicaCount or maxReplicaCount improves performance, but the question focuses on cost reduction without impacting performance, and the key insight is that the current minReplicaCount is unnecessarily high given the actual scaling behavior.

How to eliminate wrong answers

Option A is wrong because increasing minReplicaCount to 5 would force at least 5 replicas to run at all times, increasing costs without any performance benefit since the actual replica count rarely exceeds 2. Option C is wrong because increasing maxReplicaCount to 20 does not address the cost issue; the endpoint rarely scales beyond 2 replicas, so a higher maximum has no effect on current spending. Option D is wrong because decreasing the CPU utilization target to 50% would cause the autoscaler to add more replicas prematurely, increasing costs and potentially causing unnecessary scaling events, while the current 85% utilization indicates efficient resource usage.

Full explanation →

954

MCQmedium

An organization wants to deploy a TensorFlow model on edge devices such as smartphones and IoT devices for offline inference. Which format should they export the model to?

A.ONNX format

B.TensorFlow Lite (TFLite)

C.SavedModel format

D.HDF5 format

AnswerB

TFLite is the standard format for deploying models on mobile, embedded, and IoT devices.

Why this answer

TensorFlow Lite is designed for on-device inference on mobile and edge devices, with reduced model size and optimized performance.

Full explanation →

955

MCQhard

A company has deployed a machine learning model that uses a large input tensor. They notice that the prediction latency varies significantly between requests of the same size. Cloud Monitoring shows that the serving endpoint's CPU utilization is consistently below 50%, but memory utilization fluctuates between 70% and 95%. What is the most likely cause?

A.The model is performing garbage collection cycles

B.The model is using excessive memory due to a memory leak

C.The prediction latency is being affected by CPU throttling

D.The model is hitting a cold start due to autoscaling

AnswerA

Garbage collection pauses can cause latency spikes without high CPU usage, as memory utilization fluctuates during GC.

Why this answer

The correct answer is A because the described symptoms—low CPU utilization (below 50%) and high, fluctuating memory utilization (70%–95%) with variable latency—are classic indicators of garbage collection (GC) pauses in a managed runtime like Python or Java. When the model processes large input tensors, it allocates significant memory; as memory pressure builds, the garbage collector runs more frequently, causing stop-the-world pauses that increase latency unpredictably, even though CPU is not fully utilized.

Exam trap

Google Cloud often tests the misconception that high memory utilization always indicates a memory leak, but the key differentiator is the pattern of fluctuation versus monotonic increase, and the fact that GC pauses cause latency spikes without high CPU usage.

How to eliminate wrong answers

Option B is wrong because a memory leak would cause memory utilization to steadily increase over time (monotonically) rather than fluctuate between 70% and 95%, and it would eventually lead to an out-of-memory crash, not just variable latency. Option C is wrong because CPU throttling (e.g., due to thermal limits or cloud provider CPU credits exhaustion) would manifest as sustained high CPU utilization or a hard cap on CPU speed, not consistently below 50% utilization. Option D is wrong because cold starts due to autoscaling occur when new instances are spun up to handle increased load, which would show a correlation with request volume spikes and initial high latency on the first request, not persistent latency variation across all requests of the same size.

Full explanation →

956

MCQeasy

You need to deploy a model to a Vertex AI endpoint that can scale down to zero when there are no requests to minimize costs. Which feature should you enable?

A.Deploy the model to a Compute Engine instance and use instance groups.

B.Use a custom metric for autoscaling

C.Enable autoscaling with minReplicaCount=0

D.Set maxReplicaCount to 0

AnswerC

minReplicaCount=0 allows the endpoint to scale to zero when idle.

Why this answer

Option C is correct because Vertex AI endpoints support autoscaling with a `minReplicaCount` of 0, which allows the endpoint to scale down to zero instances when there are no incoming requests, thereby minimizing costs. This feature is specifically designed for serverless model serving, where the endpoint automatically scales up from zero when traffic arrives and scales down to zero during idle periods.

Exam trap

The trap here is that candidates confuse `minReplicaCount=0` with `maxReplicaCount=0`, thinking that setting the maximum to zero will scale down to zero, but in reality, `maxReplicaCount=0` disables the endpoint entirely, while `minReplicaCount=0` is the correct parameter to allow scaling to zero instances.

How to eliminate wrong answers

Option A is wrong because deploying to a Compute Engine instance with instance groups does not natively support scaling down to zero; instance groups require at least one running instance, and you would still incur costs for the underlying VMs even if they are idle. Option B is wrong because custom metrics for autoscaling can help scale based on custom signals, but they do not enable scaling to zero replicas unless the underlying autoscaler supports a `minReplicaCount` of 0, which is not a feature of custom metrics alone. Option D is wrong because setting `maxReplicaCount` to 0 would prevent any replicas from being deployed, making the endpoint unable to serve any requests; `maxReplicaCount` controls the upper limit, not the lower limit for scaling down.

Full explanation →

957

MCQeasy

A data science team wants to share engineered features across multiple projects while ensuring low-latency serving for online predictions. Which Google Cloud service should they use to store and serve these features?

A.Vertex AI Model Registry

B.Cloud Storage

C.BigQuery

D.Vertex AI Feature Store

AnswerD

Vertex AI Feature Store provides feature management with online store for low-latency serving and offline store for training.

Why this answer

Vertex AI Feature Store is purpose-built for managing and sharing ML features, with an online store for low-latency serving. BigQuery is for analytics, Cloud Storage for objects, and Vertex AI Model Registry for models.

Full explanation →

958

MCQeasy

For a low-latency real-time serving requirement, which type of Vertex AI Endpoint is appropriate?

A.Regional endpoint

B.Public endpoint

C.Private endpoint with VPC network

D.Global endpoint

AnswerA

Regional endpoints are deployed in a specific region, allowing proximity to clients for low latency.

Why this answer

Option C is correct because a regional endpoint can be deployed in the same region as the clients to minimize network latency, and it provides low-latency serving. Option A (private endpoint) is for security, not necessarily low latency. Option B (public endpoint) adds internet latency.

Option D (global endpoint) is optimized for multi-region traffic but may add slight overhead.

Full explanation →

959

MCQmedium

You need to run a distributed training job on Vertex AI using TensorFlow with MirroredStrategy on a single machine with 4 GPUs. Which training configuration should you use?

A.Use MirroredStrategy with a single workerPoolSpec containing a machine_type with 4 GPUs

B.Use MultiWorkerMirroredStrategy with multiple workerPools

C.Use MirroredStrategy with two workerPoolSpecs, each with 2 GPUs

D.Use ParameterServerStrategy with a chief and a parameter server

AnswerA

MirroredStrategy handles intra-machine GPU parallelism. Single worker pool with multiple GPUs is correct.

Why this answer

For single-machine multi-worker training with multiple GPUs, TensorFlow's MirroredStrategy is appropriate. The workerPoolSpec should have a single worker pool with a machine type that has multiple GPUs.

Full explanation →

960

MCQmedium

A team uses Vertex AI Feature Store with an online store. They need low-latency serving for millions of features with high write throughput. Which online store type should they choose?

A.Cloud SQL online store

B.Optimized online store

C.Firestore online store

D.Bigtable online store

AnswerD

Bigtable provides low latency and high throughput, ideal for this scenario.

Why this answer

Bigtable online store is optimized for high throughput and low latency, suitable for large-scale online serving.

Full explanation →

961

Multi-Selectmedium

A company uses Vertex AI Model Monitoring. Which two configuration options can be set to reduce false positive drift alerts?

Select 2 answers

A.Use a sample percentage of predictions

B.Set a shorter alerting window

C.Increase the drift threshold

D.Decrease the drift threshold

E.Enable feature attribution monitoring

AnswersA, C

Sampling reduces the volume of data compared, potentially reducing noise-induced false alarms.

Why this answer

Option A is correct because using a sample percentage of predictions reduces the volume of data analyzed for drift, which lowers the chance of detecting statistically insignificant fluctuations that could trigger false positive alerts. This is a common technique to filter out noise in high-throughput production systems.

Exam trap

Google Cloud often tests the misconception that increasing sensitivity (lowering thresholds or shortening windows) reduces false positives, when in fact the opposite is true—these actions increase alert volume and false positives.

Full explanation →

962

Matchingmedium

Match each MLOps practice to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Continuous integration and deployment for ML pipelines

Track and manage different model iterations

Monitor for changes in data or model performance over time

Schedule or trigger model retraining based on conditions

Compare model versions in production with traffic splitting

Why these pairings

MLOps practices ensure reliable and maintainable ML systems. CI/CD automates deployment, Model Monitoring tracks performance, and Data Versioning manages dataset changes. Common confusions arise when mixing these responsibilities.

Full explanation →

963

MCQmedium

You need to query a Vertex AI Vector Search index for nearest neighbours. The index is deployed on an endpoint. Which API method should you use to perform the query?

A.projects.locations.indexEndpoints.findNeighbors

B.projects.locations.indexes.match

C.projects.locations.indexes.query

D.projects.locations.endpoints.predict

AnswerA

Correct. The findNeighbors method is used to query a deployed index endpoint.

Why this answer

The correct API method to query a deployed Vertex AI Vector Search index for nearest neighbors is `projects.locations.indexEndpoints.findNeighbors`. This method is specifically designed for vector similarity search against an index endpoint, returning the nearest neighbors for a given query vector. The other options either target the wrong resource (indexes instead of indexEndpoints) or use methods intended for different purposes like model prediction.

Exam trap

Cisco often tests the distinction between model prediction endpoints and vector search endpoints, so the trap here is confusing the `predict` method (for model inference) with the `findNeighbors` method (for vector similarity search), leading candidates to incorrectly select option D.

How to eliminate wrong answers

Option B is wrong because `projects.locations.indexes.match` is not a valid API method; the correct method for matching against an index is `findNeighbors` on the index endpoint. Option C is wrong because `projects.locations.indexes.query` does not exist; the query operation for vector search is performed via the index endpoint, not directly on the index resource. Option D is wrong because `projects.locations.endpoints.predict` is used for online prediction from a deployed model, not for querying a vector search index.

Full explanation →

964

Multi-Selectmedium

An organization wants to implement central governance for ML models across teams. Which TWO services should they use together to achieve model versioning, lineage, and deployment management? (Select 2)

Select 2 answers

A.Vertex AI Feature Store

B.Vertex AI Model Registry

C.Vertex AI Metadata

D.Vertex AI Experiments

E.Cloud Data Catalog

AnswersB, C

Handles model versioning, aliases, and deployment.

Why this answer

Vertex AI Model Registry manages model versions and aliases; Vertex AI Metadata tracks lineage.

Full explanation →

965

MCQhard

A team is building a CI/CD pipeline for an ML model. They want to automatically trigger a Vertex AI pipeline for retraining whenever new training data arrives in a Cloud Storage bucket, but only if a specific Pub/Sub notification is published by a data ingestion process. Which approach meets these requirements with minimal operational overhead?

A.Use Cloud Scheduler to run a job every hour that checks for new files in Cloud Storage and starts the pipeline if new files exist.

B.Configure a Cloud Build trigger that listens to the Pub/Sub topic and executes a build step that submits the pipeline run.

C.Use Eventarc to route the Pub/Sub notification to a Cloud Function that calls the Vertex AI pipeline creation API.

D.Create a Dataflow streaming pipeline that reads from Pub/Sub and triggers the Vertex AI pipeline via a custom sink.

AnswerC

Eventarc provides a serverless event-driven integration; Cloud Function handles the trigger with minimal overhead.

Why this answer

Option C is correct because Eventarc can directly listen to a Pub/Sub topic and route matching messages to a Cloud Function, which then calls the Vertex AI pipeline creation API. This serverless approach triggers the pipeline only when the specific Pub/Sub notification is published, meeting the requirement with zero infrastructure to manage and no polling overhead.

Exam trap

The trap here is that candidates may over-engineer the solution by choosing Dataflow (Option D) because it sounds 'streaming' and 'real-time', but the simplest serverless event-driven approach (Eventarc + Cloud Function) meets the requirement with minimal operational overhead.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler polling every hour introduces latency (up to 1 hour) and does not respond to the Pub/Sub notification; it also requires managing a scheduled job and checking for new files, which adds operational overhead and may miss the specific trigger condition. Option B is wrong because Cloud Build triggers are designed for source code changes (e.g., Git commits) and cannot directly listen to a Pub/Sub topic for arbitrary messages; even if configured with a Pub/Sub trigger, Cloud Build is intended for building containers, not for orchestrating ML pipeline runs, and would require extra steps to invoke Vertex AI. Option D is wrong because a Dataflow streaming pipeline is overkill for this simple event-driven trigger; it introduces a persistent streaming job with associated cost and complexity, whereas a lightweight Cloud Function is sufficient and more cost-effective.

Full explanation →

966

MCQeasy

A media company wants to transcribe audio files from customer support calls into text for analysis. The audio is in English with clear speech and no background noise. They want a quick solution with no ML model training. Which Google Cloud service should they use?

A.Translation API to translate the audio

B.AutoML NLP to train a transcription model

C.Vertex AI Workbench to train a custom speech recognition model

D.Speech-to-Text API with the latest_long model

AnswerD

Why this answer

Speech-to-Text is a pre-built API for transcribing audio to text. It is ready to use without training. AutoML NLP is for text classification, not transcription.

Vertex AI Workbench and Translation API are not relevant.

Full explanation →

967

Multi-Selecthard

Which TWO actions are recommended to detect and mitigate data drift in a production ML system on Vertex AI?

Select 2 answers

A.Deploy multiple models and use an ensemble to average predictions

B.Manually review model predictions daily

C.Automatically retrain the model when drift exceeds thresholds

D.Set up Vertex AI Model Monitoring to alert on feature distribution changes

E.Monitor prediction errors and flag when confidence is low

AnswersC, D

Automated retraining mitigates drift.

Why this answer

Option C is correct because Vertex AI's automated retraining pipeline can be triggered when data drift exceeds a predefined threshold, ensuring the model adapts to distribution changes without manual intervention. Option D is correct because Vertex AI Model Monitoring continuously tracks feature distribution statistics (e.g., using Jensen-Shannon divergence or L-infinity distance) and sends alerts when drift is detected, enabling proactive mitigation.

Exam trap

Google Cloud often tests the distinction between drift detection (monitoring input distributions) and model performance monitoring (tracking prediction errors or confidence), leading candidates to confuse E with a valid drift mitigation technique.

Full explanation →

968

MCQmedium

A data science team is building a feature engineering pipeline that processes large-scale data from BigQuery daily. They need to compute aggregate features and store the results in Vertex AI Feature Store for both online serving and offline training. Which Google Cloud service is best suited for this batch computation?

A.Cloud Composer

B.Dataproc

C.Cloud Functions

D.Dataflow

AnswerD

Dataflow (Apache Beam) is the correct choice for scalable batch processing and integrates with Feature Store.

Why this answer

Dataflow is ideal for batch processing large datasets from BigQuery with Apache Beam. It can write directly to Feature Store's API. Cloud Functions is event-driven and not for heavy batch.

Dataproc is for Spark/Hadoop, not as efficient for Beam. Cloud Composer is an orchestrator, not executor.

Full explanation →

969

MCQmedium

An ML engineer is using Vertex AI Pipelines and wants to reuse a trained model across multiple pipeline runs without retraining each time. Which artifact management strategy should be used?

A.Store the model in BigQuery as a ML model

B.Use Cloud Functions to cache the model

C.Save the model to a Cloud Storage bucket and reference by path

D.Use Vertex AI ML Metadata to track and retrieve model artifacts

AnswerD

ML Metadata provides lineage and artifact tracking, enabling efficient reuse across pipelines.

Why this answer

Vertex AI ML Metadata is the correct artifact management strategy because it is purpose-built for tracking and retrieving model artifacts across pipeline runs. It stores metadata about models, datasets, and other artifacts in a lineage graph, enabling you to query and reuse a specific model version without retraining. This integrates natively with Vertex AI Pipelines, allowing you to pass model artifacts between components and retrieve them by ID or custom properties.

Exam trap

Google Cloud often tests the misconception that simply saving a model to Cloud Storage (Option C) is sufficient for artifact management, but the trap is that it ignores the need for metadata tracking, version lineage, and automated retrieval—features that Vertex AI ML Metadata provides as a managed service.

How to eliminate wrong answers

Option A is wrong because BigQuery is a data warehouse for structured data, not an artifact store for ML models; storing a model in BigQuery as an ML model (e.g., CREATE MODEL) is for in-database inference, not for retrieving a trained model artifact across pipelines. Option B is wrong because Cloud Functions are event-driven compute services, not a caching mechanism for model artifacts; they lack persistent storage and artifact versioning, and using them to cache models would be inefficient and unscalable. Option C is wrong because while saving a model to Cloud Storage and referencing by path is a common pattern, it is not a managed artifact management strategy—it lacks metadata tracking, version lineage, and automatic retrieval capabilities that Vertex AI ML Metadata provides, making it error-prone for reuse across multiple pipeline runs.

Full explanation →

970

Multi-Selectmedium

You are using tf.Transform to preprocess data for a TensorFlow model. You want to ensure that the same transformations applied during training are also applied during serving. Which THREE components are necessary to achieve this?

Select 3 answers

A.Use the tf.Transform analyze_and_transform function on the training data

B.Use TensorFlow Serving with the exported SavedModel

C.Store raw data in BigQuery for serving

D.Save the transform function and load it in the serving input function

E.Duplicate the preprocessing code in the serving application

AnswersA, B, D

This function computes statistics and applies transformations, producing a transform graph.

Why this answer

Option A is correct because `tf.Transform.analyze_and_transform` computes the full-pass statistics (e.g., mean, variance, vocabulary) needed for consistent preprocessing and applies the transformation to the training data. This function produces a `tf.Transform` graph that captures the exact operations, ensuring the same transformation logic is available for both training and serving.

Exam trap

Cisco often tests the misconception that duplicating preprocessing code (Option E) is acceptable, but the trap is that this leads to silent data drift and inconsistent model behavior between training and serving.

Full explanation →

971

Multi-Selecteasy

A data analyst wants to use BigQuery ML to train a linear regression model (LINEAR_REG) to predict house prices. They have a table with features like square footage, number of bedrooms, and location. Which TWO statements about the training process are correct?

Select 2 answers

A.The analyst must call ML.TRAIN after CREATE MODEL to start training

B.The trained model is stored in Cloud Storage

C.The model must be exported to Vertex AI for prediction

D.The model is automatically evaluated on a held-out test set if data splitting is enabled

E.Training is performed using the CREATE MODEL statement

AnswersD, E

By default, BigQuery ML splits data into training and evaluation sets.

Why this answer

Option D is correct because when data splitting is enabled in BigQuery ML, the `CREATE MODEL` statement automatically reserves a portion of the input data as a held-out test set. After training completes, BigQuery ML evaluates the model on this test set and reports metrics like mean absolute error and R², without requiring any manual split or separate evaluation step.

Exam trap

Cisco often tests the misconception that BigQuery ML requires an explicit training command (like `ML.TRAIN`) or that models are stored in Cloud Storage by default, when in fact training is fully encapsulated in `CREATE MODEL` and models reside in BigQuery's internal storage.

Full explanation →

972

MCQhard

A team is using Vertex AI Explainability with a deployed model. They need to generate explanations for image classification predictions. Which explanation method should they configure in the ExplanationSpec?

A.XRAI

B.SHAP with KernelExplainer

C.Sampled Shapley

D.Integrated Gradients

AnswerA

XRAI is the method designed for image models in Vertex AI Explainability.

Why this answer

XRAI (eXplanation with Ranked Area Integrals) is specifically designed for image models to highlight regions that contribute to the prediction.

Full explanation →

973

MCQhard

A data engineer is troubleshooting a Vertex AI Endpoint that serves a large BERT model. After deployment, many prediction requests fail with 'Out of Memory' errors. The machine type is n1-standard-8 (30 GB memory) with no accelerator. Which action will most likely resolve the issue?

A.Change the machine type to n1-highmem-16 (104 GB memory).

B.Use batch prediction instead of online prediction.

C.Add a GPU accelerator (e.g., NVIDIA T4) to offload computation.

D.Quantize the model from FP32 to INT8.

AnswerA

Increasing memory directly resolves out-of-memory errors.

Why this answer

Option C is correct because n1-standard-8 has only 30 GB memory, which may be insufficient for a large BERT model (e.g., around 1.5 GB parameters but with intermediate tensors can exceed 30 GB). Upgrading to a high-memory machine type provides more memory. Option A is wrong because adding a GPU does not increase system memory.

Option B is wrong because model quantization reduces model size but not necessarily memory spikes during inference. Option D is wrong because batch prediction is not for real-time, and OOM might still occur.

Full explanation →

974

MCQmedium

A data scientist deployed a classification model on Vertex AI Endpoints. After a week, the model's accuracy drops significantly from 92% to 78%. The data scientist suspects training-serving skew. What is the first step to confirm this?

A.Look for data leakage in the training pipeline

B.Compare feature distributions between training and serving data using Vertex AI Model Monitoring

C.Examine the feature importance of the model

D.Check the prediction confidence over time

AnswerB

Model Monitoring can detect skew by comparing distributions.

Why this answer

Option B is correct because Vertex AI Model Monitoring provides a built-in capability to automatically detect training-serving skew by comparing feature distributions between the training data and the live serving data. This is the most direct and efficient first step to confirm whether the accuracy drop is due to a shift in the input data distribution, which is the hallmark of training-serving skew. The data scientist can set up monitoring jobs that compute statistical distance metrics (e.g., Jensen-Shannon divergence) and alert when significant deviations occur.

Exam trap

Google Cloud often tests the distinction between diagnosing the root cause of a performance drop versus investigating a specific type of issue; the trap here is that candidates may jump to data leakage (Option A) because it sounds similar to skew, but leakage is a pre-deployment problem, not a post-deployment distribution shift.

How to eliminate wrong answers

Option A is wrong because looking for data leakage in the training pipeline addresses a different problem—where the model inadvertently uses information from the future or target during training—not a post-deployment distribution shift between training and serving data. Option C is wrong because examining feature importance helps understand which features drive predictions but does not directly compare training and serving distributions to confirm skew. Option D is wrong because checking prediction confidence over time can indicate model uncertainty but does not isolate whether the cause is a change in input data distribution versus model drift or other issues.

Full explanation →

975

Multi-Selecthard

A team is monitoring a batch prediction job on Vertex AI. Which two metrics should they monitor to ensure the job completes successfully without errors?

Select 2 answers

A.Data size of input

B.Prediction requests per second

C.Job failure rate

D.Model endpoint latency

E.Number of preempted workers

AnswersC, E

Failure rate directly indicates job success.

Why this answer

Option C is correct because the job failure rate directly indicates whether the batch prediction job is completing successfully or encountering errors. Monitoring this metric allows the team to detect and respond to failures in the prediction pipeline, ensuring the job finishes without errors.

Exam trap

Google Cloud often tests the distinction between batch and online prediction metrics, and the trap here is that candidates mistakenly apply online serving metrics (like latency or requests per second) to batch jobs, or overlook worker preemption as a critical failure indicator in distributed batch processing.

Full explanation →

Page 13 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Scaling Prototypes into ML Models Collaborating to manage data and models Solving business challenges with ML

See all domains with question counts →

Google Professional Machine Learning Engineer PMLE Questions 901–975 | Page 13/14 | Courseiva