Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 676–750

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 10 of 14

676

MCQhard

A company uses Vertex AI for online predictions with a large ensemble model that requires GPU acceleration. They want to reduce inference latency by batching multiple requests into a single GPU inference call. What should they configure?

A.Use Vertex AI Model Optimization for automatic compilation.

B.Deploy the model with NVIDIA Triton Inference Server configured for dynamic batching.

C.Increase the number of GPU replicas to handle higher concurrency.

D.Enable model quantization using TensorRT.

AnswerB

Correct: Triton supports dynamic batching to improve GPU utilization and reduce per-request latency.

Why this answer

NVIDIA Triton Inference Server supports dynamic batching, which automatically groups multiple inference requests into a single GPU call. This reduces overhead and improves GPU utilization, directly addressing the need to lower latency for online predictions with a large ensemble model on Vertex AI.

Exam trap

Cisco often tests the distinction between model-level optimizations (quantization, compilation) and runtime optimizations (batching), leading candidates to confuse techniques that improve single-request speed with those that improve throughput via request aggregation.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Optimization for automatic compilation focuses on model-level optimizations like pruning or quantization, not on batching runtime requests. Option C is wrong because increasing GPU replicas improves concurrency but does not batch requests into a single inference call; it may even increase latency due to inter-replica coordination. Option D is wrong because model quantization using TensorRT reduces model size and speeds up computation per request, but it does not implement request batching at the inference server level.

Full explanation →

677

MCQhard

A company uses Vertex AI Pipelines with Kubeflow DSL for hyperparameter tuning. They notice that some trials fail due to OOM errors. How should they configure the pipeline to automatically handle this?

A.Use a larger machine type for the whole pipeline

B.Use Cloud Composer to catch failures and resubmit

C.Reduce the number of trials

D.Add a retry policy to the hyperparameter tuning step with backoff

E.Increase the memory for all trials in the pipeline definition

AnswerD

Retries failed trials automatically.

Why this answer

Option D is correct because Vertex AI Pipelines supports retry policies on individual pipeline steps, including hyperparameter tuning jobs. By adding a retry policy with exponential backoff, the pipeline can automatically re-run failed trials caused by transient OOM errors without manual intervention, while avoiding immediate retries that could overload resources.

Exam trap

Google Cloud often tests the misconception that retry policies are only for network requests or that OOM errors require permanent resource increases, when in fact transient OOMs in ML pipelines can be handled gracefully with step-level retries and backoff.

How to eliminate wrong answers

Option A is wrong because using a larger machine type for the whole pipeline is inefficient and costly; it does not target only the failing trials and may not resolve OOM errors if the issue is specific to certain hyperparameter configurations. Option B is wrong because Cloud Composer is an orchestration service for Apache Airflow workflows, not designed to catch and resubmit individual Vertex AI pipeline step failures; it adds unnecessary complexity and latency. Option C is wrong because reducing the number of trials limits the search space and may prevent finding the optimal hyperparameters, without addressing the root cause of OOM errors.

Option E is wrong because increasing memory for all trials in the pipeline definition is a blunt approach that wastes resources on trials that do not need extra memory, and it does not handle transient failures that may occur even with sufficient memory.

Full explanation →

678

Multi-Selecthard

A machine learning team is deploying a model for real-time predictions using Vertex AI. They need to ensure that the deployment follows best practices for collaboration and governance. Which TWO actions should they take?

Select 2 answers

A.Use a continuous integration/continuous deployment (CI/CD) pipeline to deploy model versions.

B.Store all model artifacts in a local file system to reduce latency.

C.Enable model monitoring to detect data drift and performance degradation.

D.Manually configure autoscaling parameters for the endpoint.

E.Allow any team member to deploy directly to production without review.

AnswersA, C

CI/CD ensures consistent, repeatable deployments.

Why this answer

Option A is correct because using a CI/CD pipeline for deploying model versions ensures automated, repeatable, and auditable deployments, which is a best practice for collaboration and governance. This approach enforces version control, testing, and approval gates, reducing the risk of errors and enabling rollback if needed.

Exam trap

Google Cloud often tests the misconception that local storage or manual configuration is acceptable for governance, when in fact centralized artifact storage and automated scaling are required for collaboration and reliability.

Full explanation →

679

MCQmedium

An MLOps team is implementing a CI/CD pipeline for a TensorFlow model on Vertex AI. The model training job takes 2 hours and produces a SavedModel. The team wants to automatically trigger a new pipeline run whenever a change is pushed to the 'main' branch of their source repository. The pipeline should include training, evaluation, and if metrics exceed a threshold, deploy the model to a Vertex AI endpoint. Which trigger configuration should they use?

A.Use Eventarc to listen for Cloud Source Repository push events and invoke a Cloud Run service that starts the pipeline.

B.Use an Artifact Registry trigger to detect new model images and then start the pipeline.

C.Set up a Cloud Scheduler job that runs every 2 hours and triggers a Vertex AI Pipeline run.

D.Configure a Cloud Build trigger that watches the 'main' branch of Cloud Source Repositories; in the build config, use steps to run the pipeline via the Vertex AI API.

AnswerD

Cloud Build triggers are designed for source code events and can orchestrate ML pipelines.

Why this answer

Option D is correct because Cloud Build triggers can be configured to watch a specific branch (e.g., 'main') in Cloud Source Repositories and automatically execute a build configuration. Within that build config, you can use the `gcloud` or `curl` steps to invoke the Vertex AI Pipeline API, which starts the training, evaluation, and conditional deployment workflow. This directly matches the requirement for a branch-based push trigger that orchestrates the full ML pipeline.

Exam trap

Google Cloud often tests the distinction between event-driven triggers (Cloud Build for source code changes) and artifact-based triggers (Artifact Registry for new images), leading candidates to confuse the two when the requirement is to start a pipeline from a code push.

How to eliminate wrong answers

Option A is wrong because Eventarc is designed for event-driven, asynchronous invocations (e.g., from Cloud Storage or Pub/Sub), but it does not natively integrate with Cloud Source Repositories push events; Cloud Build triggers are the correct service for repository push events. Option B is wrong because an Artifact Registry trigger would fire only after a new model image is pushed, but the requirement is to trigger on a source code change (push to 'main'), not on a new artifact. Option C is wrong because a Cloud Scheduler job running every 2 hours is a time-based schedule, not a push-triggered event; it would not respond to code changes and would run even when no changes occur, wasting resources.

Full explanation →

680

MCQeasy

An ML team wants to use Vertex AI Hyperparameter Tuning to tune a custom training job. They have a budget of 50 trials and want to use an algorithm that balances exploration and exploitation. Which algorithm should they choose?

A.Random search

B.Grid search

C.Bayesian optimization (Vizier default)

D.Manual search

AnswerC

Bayesian optimization is designed to balance exploration and exploitation.

Why this answer

Bayesian optimization (the default algorithm in Vertex AI Vizier) is the correct choice because it explicitly balances exploration and exploitation by building a probabilistic model of the objective function and using an acquisition function to select the next hyperparameter configuration. With a budget of 50 trials, this algorithm efficiently converges to optimal regions while still exploring uncertain areas, making it ideal for tuning custom training jobs where each trial is computationally expensive.

Exam trap

Cisco often tests the misconception that random search is the best default for balancing exploration and exploitation, but the trap here is that random search lacks any exploitation mechanism, making Bayesian optimization the correct choice for efficient tuning within a constrained budget.

How to eliminate wrong answers

Option A is wrong because random search does not balance exploration and exploitation; it samples hyperparameters uniformly at random without using past trial results to guide future selections, which wastes budget on suboptimal regions. Option B is wrong because grid search exhaustively evaluates a fixed set of hyperparameter combinations, which is computationally inefficient for a budget of 50 trials and does not incorporate any exploitation mechanism. Option D is wrong because manual search relies on human intuition and ad-hoc adjustments, which is not an automated algorithm and cannot systematically balance exploration and exploitation within a defined trial budget.

Full explanation →

681

MCQmedium

A team uses Vertex AI Feature Store to serve features for online predictions. They notice that the online serving latency is high for certain features. The features are stored in a BigQuery source with high cardinality. What is the best practice to reduce latency?

A.Use batch prediction instead of online prediction.

B.Move the features to Cloud Storage and read them directly.

C.Increase the number of nodes in the feature store cluster.

D.Use feature store caching with a larger cache size.

AnswerD

Caching frequently accessed features reduces BigQuery calls and latency.

Why this answer

Option B is correct because caching can reduce repeated access to BigQuery. Option A might help but not directly address high cardinality; option C would not integrate with Feature Store; option D is a workaround but not best practice.

Full explanation →

682

MCQmedium

A data scientist wants to deploy a model trained with PyTorch to a Vertex AI endpoint for online predictions. What is the recommended approach?

A.Package the model in a custom container with a web server (e.g., FastAPI) and deploy to Vertex AI.

B.Use Vertex AI's pre-built PyTorch container and upload the state dictionary.

C.Export the PyTorch model to a SavedModel format and deploy using Vertex AI's pre-built TensorFlow container.

D.Convert the model to TensorFlow.js and deploy to Cloud Functions.

AnswerA

Correct: Custom container allows full control over PyTorch serving environment.

Why this answer

Option A is correct because Vertex AI requires a custom container for PyTorch models, as it does not provide a pre-built PyTorch serving container. The recommended approach is to package the trained PyTorch model with a web server like FastAPI (or Flask) that loads the model and exposes an HTTP endpoint for predictions. This container is then deployed to Vertex AI for online predictions, giving full control over the inference environment and dependencies.

Exam trap

Cisco often tests the misconception that Vertex AI provides pre-built containers for all major frameworks, but in reality, only TensorFlow, scikit-learn, and XGBoost have official pre-built containers; PyTorch requires a custom container.

How to eliminate wrong answers

Option B is wrong because Vertex AI does not offer a pre-built PyTorch container; the pre-built containers are for TensorFlow, scikit-learn, and XGBoost only. Uploading a state dictionary alone would not create a runnable serving environment. Option C is wrong because exporting a PyTorch model to SavedModel format (TensorFlow's format) and using a TensorFlow container is not a recommended or straightforward path; it requires conversion via ONNX or other tools, introduces potential incompatibilities, and is not the standard deployment method for PyTorch.

Option D is wrong because converting to TensorFlow.js and deploying to Cloud Functions is intended for client-side or lightweight serverless inference, not for production-grade online predictions with GPU support or complex model serving requirements; Cloud Functions also have request timeout and memory limitations unsuitable for many PyTorch models.

Full explanation →

683

MCQeasy

You want to use a pre-trained model from TensorFlow Hub for image classification, but you need to adapt it to classify your own custom categories with a small dataset. Which Vertex AI approach is most appropriate?

A.Write a custom training script that loads the pre-trained model and fine-tunes it on your dataset

B.Deploy the pre-trained model as-is via Vertex AI JumpStart

C.Build a custom container with the pre-trained model and deploy to Vertex AI Endpoints

D.Use Vertex AI AutoML for image classification

AnswerA

Fine-tuning a pre-trained model is the standard transfer learning approach, efficient with small data.

Why this answer

Transfer learning fine-tunes a pre-trained model on a new dataset with small data. JumpStart deploys foundation models but not fine-tune for custom categories easily. Custom container is overkill.

AutoML requires no code but may not be suitable if you want to control the pre-trained model.

Full explanation →

684

MCQmedium

You want to reduce training costs by using preemptible VMs on Vertex AI for a fault-tolerant distributed training job that uses checkpointing. Which machine type should you choose in the worker pool configuration?

A.Use spot VMs by setting 'spot' to true in the machine spec

B.Use custom machine types with preemptible flag

C.Use standard VMs and rely on Vertex AI auto-restart

D.Use TPU VMs because they are cheaper

AnswerA

Spot VMs are discounted preemptible instances, suitable for fault-tolerant jobs.

Why this answer

Vertex AI supports spot VMs (formerly preemptible) for training. By setting 'spot' to true, you use preemptible instances at a discount. The training must handle preemption via checkpointing.

Full explanation →

685

MCQhard

A company wants to use ML to predict customer churn. They have user activity logs in Cloud Storage, account data in BigQuery, and want an automated pipeline. Which pipeline architecture on Google Cloud should they use?

A.Load both data sources into AutoML Tables and train directly

B.Export logs from Cloud Storage to Cloud Dataproc for preprocessing, then train

C.Use Cloud Functions to preprocess data, then train on AI Platform

D.Use BigQuery to join logs and account data, train on Vertex AI, deploy to an endpoint

AnswerD

Seamless integration: BigQuery queries external tables, Vertex AI trains from BigQuery, endpoint serves.

Why this answer

Option D is correct because it leverages BigQuery's ability to join structured account data with semi-structured logs (via federated queries or external tables), then uses Vertex AI for end-to-end ML training and deployment. This architecture minimizes data movement, keeps the pipeline serverless, and directly addresses the requirement for an automated pipeline with both data sources.

Exam trap

Google Cloud often tests the misconception that AutoML Tables can handle multi-source data natively, when in fact it requires a single pre-joined dataset, and that Cloud Functions are suitable for heavy preprocessing workloads despite their strict resource limits.

How to eliminate wrong answers

Option A is wrong because AutoML Tables requires data to be in a single table format (CSV/JSON) and cannot directly ingest data from two separate sources without prior joining; it also lacks native pipeline automation. Option B is wrong because Cloud Dataproc (managed Spark/Hadoop) is overkill for simple preprocessing and introduces unnecessary cluster management overhead; BigQuery can perform the join and preprocessing more efficiently without spinning up ephemeral clusters. Option C is wrong because Cloud Functions have a 9-minute timeout and 2GB memory limit, making them unsuitable for preprocessing large-scale log data; Vertex AI is the correct training platform, but the preprocessing should be done in BigQuery, not Cloud Functions.

Full explanation →

686

MCQhard

An ML engineer is optimizing a large model for deployment on Vertex AI with GPU acceleration. They want to reduce model size and improve inference latency without significant accuracy loss. Which tool should they use?

A.Use gcloud CLI to prune the model.

B.Use Cloud TPU for faster inference.

C.Use Vertex AI Model Optimization with TensorRT.

D.Use TensorFlow.js converter to optimize the model for web.

AnswerC

Vertex AI Model Optimization uses TensorRT to quantize and compile models for NVIDIA GPUs, reducing latency and model size.

Why this answer

Option C is correct because Vertex AI Model Optimization with TensorRT is specifically designed to reduce model size and improve inference latency on NVIDIA GPUs by applying techniques like quantization, pruning, and graph optimizations. TensorRT optimizes the model for the target GPU architecture, enabling faster inference with minimal accuracy loss, which directly addresses the engineer's goals.

Exam trap

Cisco often tests the misconception that any optimization tool (like gcloud CLI or TensorFlow.js) can perform model pruning or latency reduction, when in fact each tool has a specific domain—Vertex AI Model Optimization with TensorRT is the only option that directly targets GPU-accelerated inference optimization on Vertex AI.

How to eliminate wrong answers

Option A is wrong because the gcloud CLI is a command-line tool for managing Google Cloud resources, not for model pruning; pruning requires specialized frameworks like TensorFlow Model Optimization Toolkit or TensorRT. Option B is wrong because Cloud TPU is a hardware accelerator for training and inference, but it does not reduce model size or optimize for GPU acceleration; it is a different hardware type and not a tool for model optimization. Option D is wrong because TensorFlow.js converter is used to convert models for web browser deployment, not for optimizing inference latency on GPU-accelerated Vertex AI deployments; it targets client-side JavaScript execution, not server-side GPU optimization.

Full explanation →

687

MCQhard

A retail company has been using Vertex AI AutoML to predict store-level demand for each product. They have a pipeline that runs nightly: data is extracted from BigQuery, preprocessed via Dataflow, and then used to train a new AutoML model each night. The model is deployed to a Vertex AI Endpoint for real-time inference. After two months, they notice that predictions for a new product category (recently launched) are consistently inaccurate, with predicted sales far exceeding actuals. They suspect data drift due to the new category. The data scientist has limited coding skills and wants a low-code solution. Which course of action should they take to improve predictions for the new category?

A.Add the product category as a feature in the AutoML dataset and retrain the model with the updated dataset

B.Retrain the model using only data from the new product category to specialize the model for that category

C.Use Vertex AI custom training with a Python script to fine-tune the model on the new category data

D.Remove the new product category from the training data because it causes bias, and rely on the pre-trained model's general pattern

AnswerA

Allows model to learn category-specific demand patterns.

Why this answer

Adding the product category as a feature in the AutoML dataset allows the model to learn the distinct demand patterns of the new category directly from the data. Vertex AI AutoML automatically handles feature engineering and can adjust its predictions based on this categorical input, addressing the data drift without requiring custom code. This low-code approach leverages AutoML's built-in ability to incorporate new features and retrain with minimal manual intervention.

Exam trap

Google Cloud often tests the misconception that specialized models (Option B) or custom training (Option C) are necessary for new data patterns, when in fact AutoML's feature-based retraining is the simplest low-code solution that leverages the model's existing architecture.

How to eliminate wrong answers

Option B is wrong because retraining only on the new category data would discard the valuable historical patterns from other categories, leading to overfitting and poor generalization for the new category. Option C is wrong because it requires custom Python scripting and custom training, which contradicts the low-code requirement and the data scientist's limited coding skills. Option D is wrong because removing the new category from training data would prevent the model from learning its specific patterns, causing the model to continue making inaccurate predictions based on the old distribution.

Full explanation →

688

MCQhard

After setting up model monitoring on Vertex AI for a classification model, the engineer sees a high number of anomaly alerts for the "age" feature. Upon investigation, the age distribution in recent predictions is similar to training data. What might be the cause?

A.The feature importance of age has changed

B.The monitoring baseline was incorrectly set

C.The monitoring threshold for age is too low

D.The model is overfitting to age

AnswerC

A low threshold triggers alerts for small, insignificant deviations.

Why this answer

Option C is correct because the high number of anomaly alerts despite the age distribution being similar to training data indicates that the monitoring threshold for the 'age' feature is set too low. In Vertex AI Model Monitoring, anomaly detection compares recent prediction distributions against a baseline using statistical tests (e.g., the Kolmogorov-Smirnov test for numerical features). If the threshold is too sensitive, even minor, statistically insignificant deviations can trigger alerts, leading to false positives even when the distribution is essentially unchanged.

Exam trap

The trap here is that candidates confuse 'anomaly alerts' with 'model performance degradation' or 'data drift,' but the question specifically states the distribution is similar, so the root cause is a misconfigured sensitivity threshold, not a genuine distribution shift.

How to eliminate wrong answers

Option A is wrong because feature importance measures the contribution of a feature to model predictions, not the distribution of the feature values themselves; a change in feature importance would not directly cause distribution-based anomaly alerts. Option B is wrong because if the monitoring baseline were incorrectly set (e.g., using a non-representative sample), the age distribution in recent predictions would likely differ from the training data, but the question states the distribution is similar, so the baseline is not the issue. Option D is wrong because overfitting to age would manifest as poor generalization on unseen data, not as anomaly alerts on the feature distribution; overfitting does not inherently trigger monitoring alerts unless the distribution shifts.

Full explanation →

689

Multi-Selectmedium

You are setting up a hyperparameter tuning job on Vertex AI for a large neural network. The objective is to minimize validation loss. You want to explore the hyperparameter space efficiently with a limited budget of 100 trials. Which THREE settings should you configure in the study?

Select 3 answers

A.Enable early stopping

B.Algorithm: BAYESIAN_OPTIMIZATION

C.Parallel trial execution count: 10

D.Algorithm: GRID_SEARCH

E.Disable early stopping

AnswersA, B, C

Early stopping (e.g., via median stopping) terminates underperforming trials early.

Why this answer

Option A is correct because enabling early stopping in Vertex AI hyperparameter tuning terminates poorly performing trials early, saving the trial budget for more promising hyperparameter configurations. This is critical when the objective is to minimize validation loss with a limited budget of 100 trials, as it prevents wasting resources on suboptimal runs and allows the search to focus on the most promising regions of the hyperparameter space.

Exam trap

Cisco often tests the misconception that grid search is suitable for large hyperparameter spaces with limited budgets, when in fact it is computationally prohibitive and should be replaced by Bayesian optimization or random search for efficiency.

Full explanation →

690

MCQmedium

A team has deployed a model on Vertex AI and wants to cache frequent identical prediction requests to improve latency and reduce cost. Which Google Cloud service should they use?

A.Cloud Bigtable

B.Cloud CDN

C.Cloud Memorystore

D.Cloud SQL

AnswerC

Memorystore provides a managed Redis or Memcached instance for caching.

Why this answer

Cloud Memorystore (Redis) can be used as a cache. The application hashes the request and checks the cache before calling the model.

Full explanation →

691

MCQhard

A large e-commerce company deploys multiple ML models on Vertex AI Endpoints. They use Vertex AI Model Registry to manage model versions. Recently, a team accidentally deployed an unvalidated model to production, causing a service outage. They want to implement a governance process where models must pass certain validation checks before deployment. The validation includes unit tests, fairness checks, and performance benchmarks. They use CI/CD pipelines (Cloud Build). They also need to allow manual approval for critical models. Which combination of Vertex AI features and Cloud Build steps would enforce the required governance?

A.Use Vertex AI Experiments to log validation results and require manual checks before deployment.

B.Set up Cloud Armor to block deployment of unvalidated models.

C.Implement Cloud Build triggers that run validation steps, then use Vertex AI Model Registry 'state' to mark models as 'validated' before allowing deployment to endpoints.

D.Use Vertex AI Continuous Monitoring to automatically detect issues and roll back deployments.

AnswerC

This enforces a gate where only models with appropriate state can be deployed.

Why this answer

Option C is correct because it combines Cloud Build triggers to run validation steps (unit tests, fairness checks, performance benchmarks) and uses Vertex AI Model Registry's 'state' field to mark models as 'validated' only after passing those checks. This state then acts as a gate in the deployment pipeline, ensuring that only validated models can be deployed to Vertex AI Endpoints. The manual approval for critical models can be integrated as a Cloud Build approval step before the state is set to 'validated'.

Exam trap

The trap here is confusing reactive monitoring (Continuous Monitoring) or unrelated security services (Cloud Armor) with proactive deployment governance, while overlooking that Vertex AI Model Registry's state field is the correct mechanism to enforce pre-deployment validation gates.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing ML experiments, not for enforcing deployment governance or blocking deployments; it cannot prevent an unvalidated model from being deployed. Option B is wrong because Cloud Armor is a web application firewall for protecting against DDoS and OWASP attacks, not a service for validating or blocking ML model deployments. Option D is wrong because Vertex AI Continuous Monitoring detects prediction drift and data quality issues after deployment, but it does not prevent the initial deployment of an unvalidated model; it is a reactive, not proactive, governance tool.

Full explanation →

692

MCQmedium

You are deploying a pre-trained BERT model for inference on edge devices. The model must be under 500 MB and inference latency under 50 ms. Which approach should you take?

A.Use a larger model like BERT-Large and deploy on GPU

B.Apply post-training INT8 quantization using TensorFlow Lite

C.Prune 50% of the model weights and fine-tune

D.Use knowledge distillation to train a smaller student model from scratch

AnswerB

Post-training INT8 quantization reduces model size by ~4x and speeds up inference, often within the target latency. It is the simplest and most effective first step.

Why this answer

Option B is correct because post-training INT8 quantization reduces model size by approximately 75% (from ~440 MB to ~110 MB for BERT-Base) and accelerates inference on edge devices via integer arithmetic, easily meeting the 500 MB and 50 ms constraints. TensorFlow Lite provides hardware-optimized kernels for ARM CPUs and NPUs, making it ideal for edge deployment without requiring retraining.

Exam trap

Cisco often tests the misconception that pruning or distillation are the only ways to reduce model size, ignoring that quantization directly addresses both size and latency without retraining, which is the fastest path for a pre-trained model.

How to eliminate wrong answers

Option A is wrong because BERT-Large is ~1.3 GB, far exceeding the 500 MB limit, and GPU deployment is not feasible on most edge devices due to power and thermal constraints. Option C is wrong because pruning 50% of weights without fine-tuning would cause catastrophic accuracy loss, and fine-tuning requires the original training data and compute, which may not be available; even with fine-tuning, pruned models often need specialized hardware for speedup. Option D is wrong because knowledge distillation requires training a smaller student model from scratch, which demands significant compute, time, and access to the teacher model's logits, making it impractical for a quick deployment scenario where a pre-trained BERT model is already available.

Full explanation →

693

Multi-Selectmedium

You are using tf.Transform to preprocess data at scale. Which TWO services are required to run tf.Transform on Google Cloud? (Choose 2)

Select 2 answers

A.Cloud Functions

B.Dataflow

C.Cloud Storage

D.Vertex AI Training

E.BigQuery

AnswersB, C

Runs the Beam pipeline.

Why this answer

tf.Transform requires Apache Beam for execution, which on GCP is typically run on Dataflow. The processed data and transform artifacts are stored in Cloud Storage.

Full explanation →

694

MCQhard

A company has a CI/CD pipeline that retrains a model every time new training data is available. They want to automatically deploy the new model to production only if it passes a set of evaluation tests on a staging environment. Which approach best implements this?

A.Implement a two-stage pipeline: train and deploy to staging, run evaluation tests, and if passed, deploy to production using conditional logic.

B.Use Cloud Build to trigger a training job and then a separate deployment job without evaluation.

C.Use a single Vertex AI pipeline that trains and deploys to staging, then manually promote.

D.Train and deploy directly to production in one pipeline.

AnswerA

This implements an automated evaluation gate.

Why this answer

A continuous delivery pipeline with evaluation gates uses separate stages: staging deployment → evaluation → promotion to production.

Full explanation →

695

MCQhard

Your team is serving a large language model on Vertex AI using a custom container. The endpoint experiences intermittent 502 errors during traffic spikes. The autoscaling configuration uses a CPU utilization target of 60% and the model is deployed on n1-standard-4 instances. The model requires significant memory. Which combination of changes is most likely to resolve the issue?

A.Increase the target CPU utilization to 90% to allow more requests per instance.

B.Switch to a machine type with more memory, e.g., n1-highmem-8, and increase min_replica_count.

C.Enable canary traffic splitting to reduce load on the main endpoint.

D.Reduce the model batch size from 32 to 1 to lower memory per request.

AnswerB

High memory instances reduce memory contention, and more replicas absorb traffic spikes.

Why this answer

The 502 errors likely indicate the instances are overwhelmed or timing out. Increasing the machine type to a high-memory instance reduces memory pressure, and adding more replicas through a lower target scaling metric or higher min replicas provides capacity. Tuning batch size helps but is secondary.

GPU may not help if the issue is memory.

Full explanation →

696

MCQeasy

A data scientist wants to use tf.Transform for preprocessing a large dataset stored in BigQuery before training a TensorFlow model. The preprocessing should be consistent during training and serving. What is the correct way to use tf.Transform in this scenario?

A.Write preprocessing logic in Python and reuse the same code in training and serving

B.Use BigQuery's built-in ML.TRANSFORM function for consistency

C.Use tf.Transform to define a preprocessing_fn and apply it to the dataset, then export the transform graph for serving

D.Use a Lambda layer in Keras for preprocessing

AnswerC

This is the standard workflow: define function, compute on full data, export graph.

Why this answer

Option C is correct because tf.Transform is specifically designed to handle full-pass preprocessing (e.g., computing min/max, vocabularies) that requires seeing the entire dataset. By defining a `preprocessing_fn` and applying it via `tft.beam.analyze_and_transform`, the transform graph is exported as a SavedModel, which can be loaded during serving to ensure identical preprocessing logic. This guarantees consistency between training and inference, which is critical for production ML pipelines.

Exam trap

Cisco often tests the misconception that any preprocessing code can be reused as-is between training and serving, or that BigQuery's ML.TRANSFORM is equivalent to tf.Transform, but the key trap is that tf.Transform is the only option that exports a portable, consistent transform graph for TensorFlow serving.

How to eliminate wrong answers

Option A is wrong because writing raw Python preprocessing code cannot guarantee consistency across training and serving environments, especially when the code relies on global statistics (e.g., mean, standard deviation) that must be computed from the full dataset and reused identically at serving time. Option B is wrong because BigQuery's ML.TRANSFORM is a BigQuery ML feature that only works within BigQuery's ML pipeline and cannot export a portable transform graph for use with TensorFlow serving outside BigQuery. Option D is wrong because a Lambda layer in Keras performs per-batch or per-sample operations and cannot compute dataset-level statistics (e.g., min, max, vocabulary) that require a full pass over the data, leading to inconsistent preprocessing between training and serving.

Full explanation →

697

MCQeasy

A developer creates a Cloud Build trigger that runs a training pipeline whenever code is pushed to the main branch of the repository. The trigger is configured to use a source archive stored in Cloud Storage. After pushing code to main, the build fails with the error shown. What is the most likely cause of this failure?

A.The build configuration file is missing from the source archive.

B.The included files filter 'train/**' excludes all files outside the train directory, causing the build to have no source.

C.The source archive is not being updated when code is pushed, so the trigger tries to fetch an old or nonexistent object.

D.The service account does not have storage.objectViewer permission on the bucket.

AnswerC

The trigger points to a static archive; pushing new code does not update the archive, leading to missing source.

Why this answer

Option C is correct because the trigger is configured to use a source archive stored in Cloud Storage. When code is pushed to the main branch, the trigger attempts to fetch the archive from the specified Cloud Storage location. If the archive is not updated (e.g., via a separate upload or a Cloud Function that rebuilds the archive on push), the trigger will either fetch an old version or fail if the object does not exist.

The error indicates that the build cannot proceed because the source archive is stale or missing, not because of a missing config file or permission issue.

Exam trap

The trap here is that candidates assume the included files filter (Option B) causes the failure, but the error is about the source archive itself being outdated or missing, not about which files are included within it.

How to eliminate wrong answers

Option A is wrong because the error message does not indicate a missing build configuration file; a missing cloudbuild.yaml would produce a specific 'build configuration file not found' error, not a generic fetch failure. Option B is wrong because the included files filter 'train/**' only restricts which files are included in the build context, but it does not cause the source archive itself to be missing or stale; the error is about fetching the archive, not about empty source. Option D is wrong because if the service account lacked storage.objectViewer permission on the bucket, the error would be a 403 Forbidden or access denied, not a generic build failure related to source archive retrieval.

Full explanation →

698

MCQmedium

A company deploys a model on Vertex AI Endpoint and expects high traffic spikes during promotional events. The current configuration uses manual scaling with 2 replicas. Which autoscaling configuration should they use to handle spikes while minimizing cost during normal traffic?

A.Keep manual scaling but increase replicas to 10.

B.Set min_replica_count=2 and max_replica_count=10 with no scaling metric.

C.Enable basic scaling with target_cpu_utilization=0.6 and set min_replica_count=2, max_replica_count=10.

D.Use custom metric scaling with a Cloud Monitoring metric for prediction latency.

AnswerC

Basic scaling adjusts replicas based on CPU load.

Why this answer

Option B is correct because basic scaling with a target metric (e.g., CPU utilization) automatically adjusts replicas based on load, reducing cost during low traffic and scaling up during spikes. Option A is wrong because no scaling cannot adapt. Option C is wrong because manual scaling requires constant adjustments.

Option D is wrong because custom metric scaling is possible but basic scaling is simpler and sufficient for CPU-bound models.

Full explanation →

699

MCQhard

A machine learning engineer is building a Vertex AI pipeline that uses a pre-built AutoML Tables component to train a classification model. The pipeline also includes a conditional step that deploys the model to an endpoint only if the evaluation metrics exceed a threshold. Which KFP feature should be used to implement the conditional deployment?

A.dsl.ParallelFor

B.dsl.ExitHandler

C.dsl.Condition

D.dsl.Collected

AnswerC

Correct: dsl.Condition (or dsl.If) provides conditional execution of tasks based on conditions.

Why this answer

The `dsl.Condition` feature from KFP (Kubeflow Pipelines) is specifically designed to conditionally execute pipeline steps based on the output of a previous component. In this scenario, the AutoML Tables component produces evaluation metrics; `dsl.Condition` allows the pipeline to check whether those metrics exceed a threshold and, if true, run the deployment step. This is the correct, native KFP construct for implementing branching logic within a pipeline.

Exam trap

The trap here is that candidates often confuse `dsl.Condition` with `dsl.ExitHandler` because both involve decision-making, but `ExitHandler` is only for post-exit cleanup, not for branching based on step outputs.

How to eliminate wrong answers

Option A is wrong because `dsl.ParallelFor` is used for iterating over a collection of items and executing steps in parallel, not for conditional branching based on a single metric threshold. Option B is wrong because `dsl.ExitHandler` is a mechanism to run a cleanup or notification step when a pipeline exits (successfully or with failure), not for conditionally deploying a model based on evaluation results. Option D is wrong because `dsl.Collected` is a function used to gather outputs from parallel iterations (e.g., from `dsl.ParallelFor`) into a single list, not a control flow construct for conditional execution.

Full explanation →

700

Multi-Selecteasy

A team has deployed a model on Vertex AI Prediction and wants to monitor for data drift. Which TWO metrics should they use to detect drift in numerical features?

Select 2 answers

A.Pearson correlation coefficient

B.Jensen-Shannon divergence (JSD)

C.Chi-squared statistic

D.Population Stability Index (PSI)

E.Kolmogorov-Smirnov (KS) statistic

AnswersB, E

JSD measures similarity between two probability distributions and works for numerical features after binning.

Why this answer

Jensen-Shannon divergence (JSD) is a symmetric, bounded (0 to 1) measure of the difference between two probability distributions, making it ideal for detecting drift in numerical features by comparing the training distribution to the serving distribution. It is a smoothed and normalized version of Kullback-Leibler divergence, and Vertex AI Prediction's Model Monitoring natively supports JSD for numerical feature drift detection.

Exam trap

Google Cloud often tests the misconception that Pearson correlation or Chi-squared are appropriate for numerical drift, when in fact Pearson measures correlation between two variables and Chi-squared is for categorical data, leading candidates to overlook the correct distribution-comparison metrics like JSD and KS.

Full explanation →

701

MCQmedium

A team has a Vertex AI pipeline that includes a container component for data preprocessing. The team notices that the component is re-executed every time the pipeline runs, even when the inputs and code haven't changed. They want to leverage pipeline caching to avoid redundant executions. What should they do to enable caching for this component?

A.Set the 'caching' flag to 'True' in the pipeline definition using 'pipeline.caching = True'.

B.Set the environment variable 'ENABLE_CACHE' to 'true' on the pipeline run request.

C.Re-compile the pipeline with the '--enable-cache' flag.

D.Ensure that the component does not have 'dsl.cache_options(enable_cache=False)' set.

AnswerD

Caching is enabled by default; if someone explicitly disabled it, removing that line will re-enable caching.

Why this answer

Option D is correct because Vertex AI pipeline caching is enabled by default for all components unless explicitly disabled using `dsl.cache_options(enable_cache=False)`. The component re-executing every time indicates that caching was likely disabled on that specific component. Removing or ensuring this setting is not present will allow the pipeline to reuse cached outputs when inputs and code have not changed.

Exam trap

The trap here is that candidates assume caching must be explicitly enabled (like in some other systems), but Vertex AI caches by default, so the issue is usually that caching was explicitly disabled on the component.

How to eliminate wrong answers

Option A is wrong because Vertex AI pipelines do not have a global `pipeline.caching` attribute; caching is controlled per component via the `@component` decorator or `ContainerComponent` definition, not at the pipeline level. Option B is wrong because there is no environment variable `ENABLE_CACHE` for Vertex AI pipeline runs; caching is configured in the pipeline definition, not via runtime environment variables. Option C is wrong because Vertex AI pipelines do not use a `--enable-cache` compilation flag; caching is a runtime feature controlled by component-level settings, not a compile-time option.

Full explanation →

702

Multi-Selectmedium

A team is responsible for monitoring the health of a Vertex AI pipeline that runs daily. Which THREE resources should they use to gain visibility into pipeline performance and failures? (Choose 3.)

Select 3 answers

A.Cloud Trace for analyzing distributed execution

B.Cloud Composer for tracking DAGs

C.Vertex AI Experiments for comparing pipeline runs

D.Cloud Monitoring for metrics and alerts on pipeline runs

E.Cloud Logging for viewing pipeline step logs

AnswersC, D, E

Vertex AI Experiments tracks pipeline runs and allows comparison of metrics across runs over time.

Why this answer

Vertex AI Experiments (Option C) is correct because it provides a systematic way to log, compare, and analyze pipeline runs, including metrics, parameters, and artifacts. This allows the team to track performance trends across daily runs, identify regressions, and correlate failures with specific run configurations, which is essential for monitoring pipeline health over time.

Exam trap

Google Cloud often tests the distinction between monitoring (observing run-level metrics and logs) and tracing (analyzing request-level latency), leading candidates to incorrectly select Cloud Trace for pipeline health visibility when it is actually intended for distributed request tracing.

Full explanation →

703

MCQmedium

A data science team uses a shared Cloud Storage bucket to store training datasets. They notice that some team members accidentally overwrite existing datasets, causing issues with reproducibility. Which approach best prevents accidental overwrites while maintaining collaboration?

A.Use a single shared service account with strict IAM roles that allow only append operations.

B.Require team members to manually rename files before uploading.

C.Set bucket permissions to read-only for all team members except the data owner.

D.Enable object versioning on the bucket and use lifecycle rules to manage versions.

AnswerD

Versioning allows recovery of previous versions if overwritten.

Why this answer

Option D is correct because enabling object versioning on a Cloud Storage bucket preserves all versions of an object, so even if a team member overwrites a dataset, the previous version remains accessible. This maintains collaboration (anyone can upload) while preventing permanent data loss. Lifecycle rules can then be used to manage storage costs by automatically deleting old versions after a specified period.

Exam trap

The trap here is that candidates may think IAM roles or permissions are the only way to control data integrity, overlooking that object versioning provides a safety net without blocking collaboration.

How to eliminate wrong answers

Option A is wrong because Cloud Storage does not support 'append-only' IAM roles; objects are immutable and must be rewritten entirely, so this approach would not prevent overwrites and would break normal upload workflows. Option B is wrong because relying on manual renaming is error-prone and does not enforce any technical control, so accidental overwrites can still occur. Option C is wrong because making the bucket read-only for most team members prevents them from uploading new datasets at all, which destroys collaboration and is overly restrictive.

Full explanation →

704

MCQmedium

A company wants to run batch predictions on millions of records stored in BigQuery. They need to preprocess the data (e.g., feature engineering) before feeding it to the model. Which approach is most scalable and cost-effective?

A.Use a large DataProc cluster to preprocess and run batch predictions.

B.Preprocess inline in the batch prediction job using a custom container.

C.Use a custom Python script on a Compute Engine instance.

D.Preprocess with Cloud Dataflow, output to Cloud Storage, then submit a Vertex AI batch prediction job.

AnswerD

Dataflow provides scalable preprocessing, and Vertex AI batch prediction reads from Cloud Storage.

Why this answer

Option D is the most scalable and cost-effective because Cloud Dataflow (Apache Beam) provides serverless, auto-scaling preprocessing that handles large volumes of data efficiently, and Vertex AI batch predictions natively read from Cloud Storage, avoiding the need to manage infrastructure. This decouples preprocessing from prediction, allowing each to scale independently and minimizing costs by using ephemeral, pay-per-use resources.

Exam trap

Cisco often tests the misconception that a single large cluster (Dataproc) or a single VM is sufficient for batch processing, when in fact serverless, auto-scaling services like Dataflow are more appropriate for large-scale, ephemeral preprocessing tasks.

How to eliminate wrong answers

Option A is wrong because Dataproc clusters require manual sizing, incur idle costs, and add operational overhead for a simple preprocessing task; it is overkill and less cost-effective than serverless options. Option B is wrong because preprocessing inline in a custom container for batch prediction tightly couples preprocessing with prediction, preventing independent scaling and making it harder to handle large-scale data transformations efficiently. Option C is wrong because a single Compute Engine instance cannot scale horizontally to process millions of records in a reasonable time, and managing failover, retries, and parallelization would require custom code, making it neither scalable nor cost-effective.

Full explanation →

705

MCQeasy

You are responsible for maintaining an ML pipeline that runs daily on Vertex AI Pipelines. The pipeline preprocesses data, trains a model, and deploys it to an endpoint. Recently, the pipeline has been failing at the deployment step because the endpoint already exists and the deploy step tries to create a new endpoint instead of updating the existing one. The pipeline code is written using the Kubeflow Pipelines SDK. You need to modify the pipeline to resolve this issue with minimal changes. What should you do?

A.Change the pipeline to use a Cloud Function that triggers the deployment independently, bypassing Vertex AI Pipelines.

B.In the deployment component, add a check to verify if the endpoint exists, and if so, call the update endpoint method instead of create.

C.Set the deploy component's retry policy to infinite so it eventually succeeds.

D.Manually delete the existing endpoint before each pipeline run.

AnswerB

This directly fixes the deployment logic to handle existing endpoints.

Why this answer

Option A is correct because it addresses the root cause: the deployment component should check if the endpoint exists and update it instead of creating a new one. Option B is wrong because using a Cloud Function bypasses the pipeline orchestration and adds unnecessary complexity. Option C is wrong because retrying will not fix the fundamental issue of trying to create an existing endpoint.

Option D is wrong because manual deletion defeats automation and is not a robust solution.

Full explanation →

706

MCQhard

A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?

A.Implement a feature engineering pipeline that compresses the input features to reduce data size and inference time.

B.Deploy a newer version of the model that uses a more efficient architecture to reduce computational complexity.

C.Increase the number of TensorFlow Serving instances by reducing the CPU request per pod in GKE to allow more pods per node.

D.Add more nodes to the GKE cluster to increase the total CPU resources available for serving.

AnswerD

Adding nodes increases compute capacity, allowing more parallel inference and reducing latency under high load.

Why this answer

The latency spike is caused by CPU saturation (95% utilization) under increased load (500 to 1,200 RPS). Adding more nodes to the GKE cluster directly increases the total CPU resources available, allowing the existing TensorFlow Serving pods to handle the higher throughput without contention. This is the most immediate and infrastructure-appropriate fix because the model version and serving configuration have not changed, ruling out model-level or code-level optimizations.

Exam trap

Google Cloud often tests the misconception that reducing per-pod CPU requests (Option C) is a valid scaling strategy, but in reality this increases overcommitment and can worsen latency under high load, whereas adding nodes (Option D) provides dedicated resources without contention.

How to eliminate wrong answers

Option A is wrong because compressing input features may reduce data size but does not address the root cause of CPU saturation; inference latency is dominated by model computation, not I/O, and the feature engineering pipeline is not part of the serving infrastructure. Option B is wrong because deploying a newer, more efficient model is a long-term optimization, not an immediate mitigation; the question states the model version has not changed and the issue is purely resource contention under load. Option C is wrong because reducing the CPU request per pod would allow more pods per node, but this would increase CPU overcommitment and worsen contention on already saturated nodes, potentially causing further latency degradation or pod evictions.

Full explanation →

707

MCQeasy

An MLOps engineer needs to collect ground truth labels for a deployed classification model to compare predictions against actuals. Where should the engineer store the ground truth data to enable Vertex AI model quality monitoring?

A.BigQuery

B.Firestore

C.Cloud Spanner

D.Cloud Storage

AnswerA

Vertex AI Model Monitoring uses ground truth labels stored in BigQuery to compute quality metrics.

Why this answer

Vertex AI Model Monitoring expects ground truth data to be uploaded to BigQuery tables, which can then be used to compute confusion matrices and other quality metrics over time.

Full explanation →

708

MCQmedium

A data science team is collaborating on a project to build a churn prediction model. They use Vertex AI Workbench instances for development. Each data scientist has their own instance with a persistent disk. They share code via a GitHub repository. They want to ensure that the model training is reproducible across different team members' environments. Currently, they manually install Python packages in their instances, and they have noticed that the model metrics differ slightly between runs on different instances. Which of the following is the best action to ensure reproducibility?

A.Standardize the instance machine type and ensure all have the same number of CPUs.

B.Use Cloud Functions to run the training code instead.

C.Use Vertex AI Experiments with a fixed environment by specifying a prebuilt container.

D.Create a custom Docker image with all dependencies and use it in Vertex AI Training jobs.

E.Ask all team members to use the same Python virtual environment and install packages from a requirements.txt file.

AnswerC

Experiments track parameters and metrics while ensuring a consistent environment.

Why this answer

Option C is correct because Vertex AI Experiments with a prebuilt container ensures a fixed, reproducible environment by pinning the exact OS, Python version, and all dependencies. This eliminates the variability introduced by manual package installations and differing instance configurations, directly addressing the team's issue of inconsistent model metrics across runs.

Exam trap

Google Cloud often tests the distinction between environment reproducibility (which requires fixed software stacks) and hardware consistency (which is less critical for deterministic training), leading candidates to mistakenly choose hardware standardization (Option A) or manual dependency management (Option E).

How to eliminate wrong answers

Option A is wrong because standardizing machine type and CPU count does not control for differences in Python package versions or system libraries, which are the primary cause of metric discrepancies. Option B is wrong because Cloud Functions are designed for event-driven, stateless workloads and are not suitable for long-running model training jobs; they also do not inherently enforce a fixed environment. Option D is wrong because while a custom Docker image is a valid approach, it is not the best action here because Vertex AI Experiments with a prebuilt container provides a simpler, managed solution that automatically tracks experiments and environments without requiring the team to build and maintain custom images.

Option E is wrong because manually using a requirements.txt file and virtual environments is error-prone and does not guarantee identical system-level dependencies or Python interpreter versions across different instances, leading to subtle reproducibility issues.

Full explanation →

709

MCQmedium

A machine learning team deploys a PyTorch model for online prediction on Vertex AI using a custom container. They notice that the first few requests after scaling up experience high latency. What is the most likely cause and how should they mitigate it?

A.The endpoint is not configured for autoscaling; enable min_replica=0 to allow scale-to-zero.

B.The model file is corrupted; re-upload to Vertex AI Model Registry.

C.The container has a slow initialization; set initialDelaySeconds in the health probe to give more time before considering the pod ready.

D.Use a smaller machine type (n1-standard-2) to reduce startup overhead.

AnswerC

Giving the container more time to load the model reduces premature traffic and latency.

Why this answer

Option C is correct because the high latency on the first few requests after scaling up is a classic symptom of a slow container initialization. By setting `initialDelaySeconds` in the health probe, you allow the container more time to start up and become ready before it receives traffic, preventing premature routing that causes timeouts or retries. This is a common tuning parameter for custom containers on Vertex AI, where model loading or dependency initialization can take several seconds.

Exam trap

The trap here is that candidates confuse slow initialization with autoscaling misconfiguration, assuming that scale-to-zero or smaller machines would fix the latency, when in fact the root cause is the readiness probe timing.

How to eliminate wrong answers

Option A is wrong because the problem occurs after scaling up, not from a cold start with zero replicas; setting min_replica=0 would actually worsen latency by requiring full cold starts. Option B is wrong because a corrupted model file would cause persistent prediction failures or errors, not just high latency on the first few requests after scaling. Option D is wrong because using a smaller machine type (n1-standard-2) would increase startup overhead and latency, not reduce it, as it provides fewer CPU and memory resources for initialization.

Full explanation →

710

MCQmedium

A company uses Vertex AI Model Monitoring to detect data drift. They have a model that predicts house prices. Which dataset should they compare against the training data to detect drift?

A.The entire historical prediction data

B.A random sample of recent predictions

C.The latest batch of predictions

D.The validation data used during training

AnswerC

Comparing the latest serving data distribution to training data detects drift.

Why this answer

Option C is correct because Vertex AI Model Monitoring compares the training data (serving as the baseline) against the latest batch of predictions to detect data drift. This batch represents the most recent inference requests, allowing the monitoring service to compute statistical distribution differences (e.g., Jensen-Shannon divergence) and trigger alerts when drift exceeds a configured threshold. Using the latest batch ensures timely detection of shifts in the production data distribution.

Exam trap

Google Cloud often tests the distinction between 'recent predictions' and 'latest batch' — the trap is that candidates confuse a random sample (which is statistically valid for inference but not for drift detection) with the complete batch that Vertex AI requires for accurate distribution comparison.

How to eliminate wrong answers

Option A is wrong because using the entire historical prediction data would dilute recent drift signals with older, potentially stale distributions, making it harder to detect current drift and violating Vertex AI's requirement for a sliding window of recent predictions. Option B is wrong because a random sample of recent predictions lacks the systematic coverage of the full production traffic; Vertex AI Model Monitoring expects a complete batch to accurately compute per-feature drift metrics, and random sampling can miss localized drift patterns. Option D is wrong because validation data is a static holdout set from training time, not production data; comparing against it would measure generalization error, not data drift, and Vertex AI Model Monitoring is designed to compare against training data, not validation data.

Full explanation →

711

MCQhard

A company uses Vertex AI Pipelines to orchestrate their ML training workflow. The pipeline includes a BigQuery ML training step, a model evaluation step, and a deployment step to Vertex AI Endpoints. The engineer notices that the pipeline fails intermittently due to a quota exceeded error on Vertex AI Endpoints during model deployment. What is the best long-term solution to prevent this failure?

A.Run the pipeline steps sequentially with longer wait times.

B.Add retry logic with exponential backoff to the deployment step in the pipeline.

C.Switch to deploying models using a custom container on Compute Engine.

D.Request a permanent quota increase for Vertex AI Endpoints.

AnswerB

Handles transient quota errors gracefully without manual intervention.

Why this answer

Option D is correct because implementing retry logic with exponential backoff is a resilient pattern for transient quota errors. Option A is wrong because increasing quota requires a support ticket and may not be granted immediately. Option B is wrong because using a custom container does not address quota limits.

Option C is wrong because sequential execution does not prevent quota errors.

Full explanation →

712

MCQhard

A company uses Vertex AI Endpoints for model serving and wants to implement A/B testing between model versions. They need to gradually shift traffic from the old to the new version while monitoring performance. Which Vertex AI feature allows this with minimal operational overhead?

A.Using a custom load balancer with weighted backend services

B.Model Deployments with traffic splitting

C.Vertex AI Experiments for tracking

D.Cloud Run revisions with traffic migration

Why this answer

Option B is correct because Vertex AI Endpoints allow deploying multiple model versions to the same endpoint and setting a traffic split percentage that can be gradually adjusted. Option A is not a feature. Option C is possible but adds overhead.

Option D is for experiments, not serving.

Full explanation →

713

Multi-Selecthard

A company has a prototype ML model that predicts equipment failure. They want to deploy it to production using Vertex AI. The model must be retrained weekly with new data. They also need to monitor for data drift and model performance. Which THREE components should they include in their MLOps pipeline? (Choose 3)

Select 3 answers

A.A scheduled training pipeline that retrains the model weekly.

B.A manual QA step where data scientists approve each deployment.

C.A manual review of new data before it is used for training.

D.An automated trigger that redeploys the model when performance drops below a threshold.

E.A monitoring system that checks for data drift and triggers alerts.

AnswersA, D, E

Scheduled retraining is essential for keeping the model up-to-date.

Why this answer

Option A is correct because the requirement specifies weekly retraining, which is best implemented as a scheduled training pipeline in Vertex AI using Cloud Scheduler or a recurring AI Platform Pipeline run. This automates the retraining process without manual intervention, ensuring the model stays current with new data.

Exam trap

Google Cloud often tests the distinction between necessary manual oversight and fully automated MLOps practices, leading candidates to overestimate the need for human approval steps in a production pipeline that demands speed and scalability.

Full explanation →

714

MCQeasy

You want to use Vertex AI Vizier for hyperparameter tuning. You have 2 categorical parameters and 3 continuous parameters. Which algorithm is best suited for this mixed parameter space?

A.Evolutionary algorithm

B.Random search

C.Bayesian optimization

D.Grid search

AnswerC

Handles mixed types and is sample-efficient.

Why this answer

Bayesian optimization can handle mixed parameter types. Grid search cannot handle continuous parameters efficiently; random search can, but Bayesian is more sample-efficient.

Full explanation →

715

MCQhard

A team is scaling their prototype inference model to handle high-throughput requests with low latency. They use a custom container on Vertex AI Prediction. They notice that latency spikes occur under heavy load. What is the most effective strategy?

A.Enable auto-scaling with a higher minimum number of replicas.

B.Optimize model serving with batching and model warm-up.

C.Use a larger machine type with more CPUs.

D.Use a GPU-based machine.

AnswerB

Batching reduces overhead per request; warm-up avoids cold start.

Why this answer

Option C is correct because optimizing model serving with batching and model warm-up reduces per-request overhead and ensures consistent latency. Option A is wrong because adding CPUs may not help if the bottleneck is model inference computation. Option B is wrong because auto-scaling doesn't reduce latency spikes; it adds replicas over time.

Option D is wrong because GPU may help but not specifically for latency spikes due to load variation.

Full explanation →

716

Multi-Selecthard

A team is serving a large language model (LLM) on Vertex AI using a custom container. They want to reduce tail latency. Which THREE strategies should they consider?

Select 3 answers

A.Increase the number of replicas.

B.Use dynamic batching to combine requests.

C.Implement response caching for common queries.

D.Quantize the model to INT8 to reduce computation.

E.Upgrade to a more powerful GPU type.

AnswersB, C, D

Improves GPU utilization and reduces per-request latency.

Why this answer

Dynamic batching (B) reduces tail latency by grouping multiple inference requests into a single batch, which improves GPU utilization and amortizes overhead across requests. This is particularly effective for LLMs because it allows the model to process more tokens per forward pass, reducing the per-request latency variance that contributes to tail latency.

Exam trap

The trap here is that candidates confuse scaling strategies (like increasing replicas or upgrading hardware) with latency-optimization techniques, failing to recognize that tail latency is primarily reduced by batching and caching, not by adding more compute resources.

Full explanation →

717

MCQmedium

An engineer needs to compile a Kubeflow Pipeline defined in Python to a JSON format that can be run on Vertex AI Pipelines. Which command should they use?

A.kfp.compiler.Compiler().compile(pipeline_func, 'pipeline.json')

B.gcloud ai pipelines compile command.

C.kfp.Client().upload_pipeline()

D.dsl.pipeline decorator automatically compiles at runtime.

AnswerA

This is the correct command to compile a pipeline function to JSON.

Why this answer

Option A is correct because the Kubeflow Pipelines SDK provides the `kfp.compiler.Compiler().compile()` method to convert a Python-based pipeline function into a JSON or YAML format that is compatible with Vertex AI Pipelines. This JSON representation defines the pipeline's components, dependencies, and execution graph, enabling it to be submitted to Vertex AI for orchestration. The `compile()` method is the standard way to produce a portable pipeline specification from Python code.

Exam trap

The trap here is that candidates may confuse the compilation step with pipeline submission or runtime execution, mistakenly thinking that the `dsl.pipeline` decorator or `gcloud` commands handle compilation automatically, when in fact the KFP SDK's `Compiler().compile()` is the explicit and required method for generating the JSON pipeline definition.

How to eliminate wrong answers

Option B is wrong because `gcloud ai pipelines compile` is not a valid gcloud command; the gcloud CLI for Vertex AI uses `gcloud ai pipelines run` to submit a pre-compiled pipeline, but compilation must be done separately using the KFP SDK. Option C is wrong because `kfp.Client().upload_pipeline()` uploads a compiled pipeline package to a Kubeflow Pipelines instance, but it does not perform the compilation step itself; the pipeline must already be compiled into a JSON or YAML file before uploading. Option D is wrong because the `dsl.pipeline` decorator defines the pipeline structure and components but does not automatically compile it at runtime; explicit invocation of `Compiler().compile()` is required to generate the JSON artifact.

Full explanation →

718

MCQmedium

A data science team wants to build a machine learning pipeline on Vertex AI Pipelines that preprocesses data, trains a model, and evaluates it. They need to ensure that components can be reused across multiple pipelines and that outputs from one component can be passed as inputs to another. Which approach should they take?

A.Write each component as a Cloud Composer DAG task using Python operators and manage dependencies via Airflow.

B.Use Vertex AI pre-built components exclusively and chain them using the Vertex AI SDK without a pipeline definition.

C.Define each step as a separate Cloud Build step and chain them via build triggers.

D.Use Kubeflow Pipelines SDK v2 to create Python function components decorated with @dsl.component and compose them into a pipeline using @dsl.pipeline.

AnswerD

This is the standard approach for reusable, composable ML pipeline components on Vertex AI Pipelines.

Why this answer

Option D is correct because Kubeflow Pipelines SDK v2 with @dsl.component and @dsl.pipeline decorators is the native way to define reusable, composable components in Vertex AI Pipelines. This approach allows each component to be a self-contained Python function that can be independently versioned and reused across multiple pipelines, with outputs automatically serialized and passed as inputs to downstream components via the pipeline graph.

Exam trap

Cisco often tests the misconception that any orchestration tool (Airflow, Cloud Build) can substitute for a purpose-built ML pipeline framework, but the key differentiator is Vertex AI Pipelines' native support for reusable components with typed artifact passing and managed execution.

How to eliminate wrong answers

Option A is wrong because Cloud Composer (Airflow) is a workflow orchestrator for general DAGs, not a purpose-built ML pipeline framework; it lacks native support for Vertex AI Pipelines' artifact tracking, component reuse, and ML-specific I/O handling. Option B is wrong because Vertex AI pre-built components cannot be chained without a pipeline definition; the Vertex AI SDK requires a pipeline specification (e.g., via Kubeflow Pipelines) to define the execution graph and pass outputs between steps. Option C is wrong because Cloud Build is a CI/CD service for building and testing code, not for orchestrating ML pipelines; it does not provide managed artifact passing, caching, or the runtime environment needed for ML training and evaluation steps.

Full explanation →

719

Matchingmedium

Match each feature engineering technique to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Convert categorical variable into binary columns

Combine two or more features to capture interactions

Normalize numeric features to a standard range

Group continuous values into discrete intervals

Weight term frequency by inverse document frequency

Why these pairings

Feature engineering techniques transform raw data into better representations. Common techniques include Polynomial Features (polynomial expansion), One-Hot Encoding (dummy variables), and Feature Scaling (standardization). Distractors confuse methods like Label Encoding or Interaction Features.

Full explanation →

720

MCQhard

A data engineering team wants to orchestrate an ML pipeline that includes data preprocessing in Dataflow, AutoML training, and model deployment. They want to minimize operational overhead. Which approach is best?

A.Use Cloud Composer with Apache Airflow DAG

B.Use AI Platform Training with script

C.Use Cloud Scheduler to trigger Cloud Functions

D.Use Vertex AI Pipelines with custom components

AnswerD

Correct: Purpose-built for ML workflows, minimal overhead.

Why this answer

Vertex AI Pipelines with custom components is the best choice because it provides a fully managed, serverless orchestration service that natively integrates with Dataflow, AutoML, and model deployment. This minimizes operational overhead by eliminating the need to manage infrastructure, handle retries, or maintain a separate orchestration server, while offering built-in artifact tracking and pipeline caching.

Exam trap

The trap here is that candidates often confuse 'orchestration' with 'scheduling' and pick Cloud Scheduler, failing to recognize that a multi-step ML pipeline requires workflow orchestration with dependencies and error handling, not just a time-based trigger.

How to eliminate wrong answers

Option A is wrong because Cloud Composer with Apache Airflow DAG requires managing a Kubernetes cluster, Airflow workers, and infrastructure, which increases operational overhead rather than minimizing it. Option B is wrong because AI Platform Training with a script only handles the training step in isolation, not the end-to-end orchestration of preprocessing, training, and deployment. Option C is wrong because Cloud Scheduler to trigger Cloud Functions is a simple time-based trigger that lacks the workflow orchestration capabilities (e.g., conditional branching, parallel steps, dependency management) needed for a multi-step ML pipeline.

Full explanation →

721

MCQeasy

A team just moved a model from prototype to production using Vertex AI. They notice prediction errors for certain inputs that were not present in training data. What should they do to detect such issues automatically?

A.Set up Vertex AI Experiments to compare predictions

B.Use BigQuery ML to analyze prediction requests

C.Enable Cloud Logging and set up alerts for error logs

D.Enable Vertex AI Model Monitoring to detect prediction anomalies

AnswerD

Model Monitoring automatically checks for drift and anomalies.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect prediction anomalies, such as data drift and feature skew, by comparing production prediction requests against the training data distribution. This allows the team to automatically identify inputs that deviate from the training data, even if those exact inputs were not present during training, without manual inspection.

Exam trap

Google Cloud often tests the distinction between monitoring for operational errors (e.g., HTTP errors) versus monitoring for model-specific issues (e.g., data drift), leading candidates to choose Cloud Logging (Option C) when the correct answer requires a dedicated ML monitoring service.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is used for tracking and comparing model training runs and hyperparameter tuning, not for monitoring production prediction requests or detecting anomalies in real-time. Option B is wrong because BigQuery ML is a tool for creating and executing machine learning models directly in BigQuery using SQL, not for analyzing prediction requests from a deployed Vertex AI model or detecting input anomalies. Option C is wrong because while Cloud Logging can capture error logs, it only reacts to explicit errors (e.g., 4xx/5xx HTTP responses) and cannot automatically detect prediction anomalies like data drift or feature skew that do not generate error logs.

Full explanation →

722

MCQeasy

A retail company wants to build a recommendation system to show 'frequently bought together' items. Which Recommendations AI model type should they use?

A.recently-viewed

B.frequently-bought-together

C.recommended-for-you

D.others-you-may-like

AnswerB

Why this answer

Option B is correct because the 'frequently-bought-together' model type in Google Cloud Recommendations AI is specifically designed to identify items that are commonly purchased in the same transaction, using co-occurrence analysis of historical purchase data. This directly matches the requirement to show items that are frequently bought together, leveraging association rule mining (e.g., Apriori algorithm) to generate recommendations.

Exam trap

Cisco often tests the distinction between personalized recommendation models (like 'recommended-for-you') and transaction-based co-purchase models, leading candidates to confuse 'frequently-bought-together' with 'others-you-may-like' due to overlapping terminology around item similarity.

How to eliminate wrong answers

Option A is wrong because 'recently-viewed' is a model type that surfaces items a user has recently browsed, not items that are frequently purchased together, and it relies on session-based user behavior rather than transaction co-occurrence. Option C is wrong because 'recommended-for-you' is a personalized model that uses user-item interaction history (e.g., collaborative filtering) to suggest items tailored to an individual, not cross-item purchase patterns. Option D is wrong because 'others-you-may-like' is a similarity-based model that recommends items similar to a given product based on content or metadata, not on transactional co-purchase frequency.

Full explanation →

723

MCQeasy

A pharmaceutical company uses Vertex AI Pipelines with custom training containers. Recently, the pipeline has been failing with 'Container failed with exit code 137' (out of memory). The container runs with default memory limit. The team needs to fix this without changing the code. The project quota for CPU and memory is sufficient. What should the team do?

A.Add a resource hint to the container spec for more memory.

B.Set the 'machineType' field for the training task to a higher memory machine.

C.Increase the model parallelism by using multi-worker training.

D.Use a smaller dataset for training.

AnswerB

This directly provides more memory to the container without code changes.

Why this answer

Option B is correct because the container is running out of memory (exit code 137) with the default memory limit. In Vertex AI Pipelines, when using custom training containers, the default memory allocation is typically 4 GiB. By setting the 'machineType' field to a higher memory machine (e.g., n1-highmem-8), the container automatically receives more memory without requiring code changes.

This directly resolves the OOM issue while respecting the constraint of not modifying the code.

Exam trap

Google Cloud often tests the misconception that resource hints or environment variables can override default memory limits in Vertex AI Pipelines, but the correct mechanism is the 'machineType' field in the task specification, not hints or code changes.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines does not support resource hints in the container spec for custom training containers; resource allocation is controlled via the 'machineType' field, not hints. Option C is wrong because multi-worker training (model parallelism) distributes computation across workers but does not increase the memory available to a single container; it would require code changes to implement distributed training, which violates the 'without changing the code' constraint. Option D is wrong because using a smaller dataset may reduce memory usage but changes the training data, which is not a valid fix for an OOM error in a production pipeline; the problem is memory allocation, not dataset size.

Full explanation →

724

MCQeasy

A team of data scientists is collaborating on notebooks in Vertex AI Workbench. They need to use Git for version control and share notebooks with real-time editing. Which type of Workbench instance should they choose?

A.User-managed notebooks

B.Vertex AI Pipelines

C.Managed notebooks

D.Dataproc Jupyter notebooks

AnswerA

User-managed notebooks are JupyterLab instances with Git integration and real-time collaboration capabilities.

Why this answer

User-managed notebooks are full JupyterLab instances that support Git integration and real-time collaboration via the JupyterLab interface. Managed notebooks are serverless but lack full Git and collaboration features.

Full explanation →

725

MCQeasy

An ML team wants to automatically retrain a model when data drift is detected. They have set up a Cloud Monitoring alert on drift. What service should they use to trigger a retraining pipeline in response to the alert?

A.Cloud Functions

B.Cloud Scheduler

C.Vertex AI Feature Store

D.Vertex AI Model Monitoring

AnswerA

Cloud Functions can subscribe to Pub/Sub and trigger Vertex AI Pipelines, enabling automated retraining.

Why this answer

Cloud Monitoring alerts can send notifications to Pub/Sub topics. A Pub/Sub message can then trigger a Cloud Function or Cloud Run service that starts a Vertex AI Pipeline for retraining.

Full explanation →

726

Matchingmedium

Match each regularization technique to its effect.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Adds absolute value of weights to loss, induces sparsity

Adds squared magnitude of weights to loss, prevents overfitting

Randomly drops units during training to prevent co-adaptation

Stops training when validation performance stops improving

Increases training data diversity through transformations

Why these pairings

Regularization techniques help generalize models. L1 encourages sparsity; L2 shrinks weights; dropout randomly drops neurons; early stopping uses a validation set. Common confusions include mixing L1 and L2 effects, or confusing dropout with penalty-based methods.

Full explanation →

727

Multi-Selecthard

A team is building a batch prediction pipeline that processes raw data from Cloud Storage, performs complex preprocessing, and then runs predictions using a large model. The preprocessing step is compute-intensive and the prediction step is I/O-bound. Which TWO Google Cloud services should they combine to optimize cost and performance? (Choose 2)

Select 2 answers

A.Dataflow for preprocessing and writing results to Cloud Storage

B.Cloud Functions to preprocess data row by row

C.Cloud Run to serve the preprocessed data as an API

D.Vertex AI Batch Prediction with Cloud Storage source

E.Vertex AI Batch Prediction with BigQuery source

AnswersA, D

Dataflow can perform complex transforms at scale.

Why this answer

Dataflow is ideal for the compute-intensive preprocessing step because it can horizontally scale across many workers to handle complex transformations in parallel, and it writes results directly to Cloud Storage, which serves as the input source for Vertex AI Batch Prediction. Vertex AI Batch Prediction is optimized for I/O-bound inference workloads: it reads batches of data from Cloud Storage, runs predictions using the large model, and writes results back to Cloud Storage, all without requiring a persistent serving endpoint, which minimizes cost for offline predictions.

Exam trap

Cisco often tests the distinction between batch and online serving patterns, and the trap here is that candidates may choose Cloud Functions or Cloud Run for preprocessing because they are familiar serverless options, without realizing that Dataflow is purpose-built for large-scale, compute-intensive batch processing and that Vertex AI Batch Prediction is the correct service for offline inference at scale.

Full explanation →

728

Drag & Dropmedium

Drag and drop the steps to set up a batch prediction job using Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

The correct sequence for setting up a batch prediction job using Vertex AI is to first prepare your input data, then register your model, create the batch prediction job, submit the job, and finally retrieve the results. This order ensures that all necessary resources are available before job creation and submission, and that results are only requested after job completion.

Full explanation →

729

MCQmedium

A machine learning engineer needs to create a pipeline that runs a custom container component on Vertex AI. The container expects a Cloud Storage path as input and outputs a model artifact. Which component type should they define using the Kubeflow Pipelines SDK v2?

A.Google Cloud Pipeline Components (GCPC) for custom containers

B.Python function component using @dsl.component

C.Importer component to load the container as an artifact

D.Container component using @dsl.container_component

AnswerD

Container components allow you to specify a Docker image that runs as a component.

Why this answer

Option D is correct because the Kubeflow Pipelines SDK v2 provides the @dsl.container_component decorator specifically for defining components that wrap custom container images. This allows the engineer to specify the container image, input/output paths (like a Cloud Storage path), and artifact metadata, enabling Vertex AI to execute the container as a pipeline step and capture the model artifact.

Exam trap

The trap here is that candidates confuse the Importer component (which only imports existing artifacts) with a component that runs a container to produce an artifact, or they mistakenly think GCPC can wrap any custom container when it only provides pre-built Google service integrations.

How to eliminate wrong answers

Option A is wrong because Google Cloud Pipeline Components (GCPC) are pre-built components for Google Cloud services (e.g., AI Platform, BigQuery), not for wrapping arbitrary custom containers. Option B is wrong because @dsl.component is used for Python function components that execute inline Python code, not for running a custom container image. Option C is wrong because the Importer component is used to import existing artifacts (like a pre-trained model) into the pipeline's metadata store, not to run a container that produces an artifact.

Full explanation →

730

Multi-Selecthard

Which TWO services are commonly used together to implement an end-to-end ML pipeline that automatically retrains and deploys models on Vertex AI? (Choose two.)

Select 2 answers

A.Cloud Dataflow

B.Vertex AI Pipelines

C.Cloud Composer

D.Cloud Source Repositories

E.Cloud Scheduler

AnswersB, E

Pipelines orchestrate the training and deployment steps.

Why this answer

Vertex AI Pipelines (B) is the correct choice because it provides a serverless, scalable orchestration service specifically designed to build, run, and manage ML pipelines on Vertex AI. It enables you to define a directed acyclic graph (DAG) of steps—including data preprocessing, training, evaluation, and deployment—and can be triggered automatically to retrain and deploy models. Cloud Scheduler (E) is commonly used together with Vertex AI Pipelines to schedule pipeline runs at fixed intervals or in response to time-based triggers, forming a complete end-to-end automated retraining and deployment workflow.

Exam trap

Google Cloud often tests the distinction between general-purpose orchestration tools (Cloud Composer) and ML-native pipeline services (Vertex AI Pipelines), leading candidates to pick Cloud Composer because of its familiarity with Airflow, even though Vertex AI Pipelines is the correct, integrated choice for end-to-end ML workflows on Vertex AI.

Full explanation →

731

MCQeasy

A retail company wants to forecast daily sales for inventory planning. They have 3 years of historical sales data with clear weekly and yearly seasonality. Which approach should they use?

A.Call a pre-built Google Cloud API for sales prediction

B.Use a linear regression model in Vertex AI

C.Use Vertex AI AutoML Tables with date as feature

D.Use BigQuery ML to train an ARIMA_PLUS model

AnswerD

ARIMA_PLUS handles seasonality and is optimized for time series.

Why this answer

Option D is correct because ARIMA_PLUS in BigQuery ML is specifically designed for time-series forecasting with multiple seasonalities (weekly and yearly). It automatically handles seasonality detection, trend decomposition, and holiday effects, making it ideal for retail sales data with clear periodic patterns.

Exam trap

The trap here is that candidates often choose AutoML Tables (Option C) thinking it can handle any structured data, but they miss that AutoML Tables is not a dedicated time-series model and requires manual feature engineering to capture seasonality, whereas ARIMA_PLUS is purpose-built for this scenario.

How to eliminate wrong answers

Option A is wrong because calling a pre-built Google Cloud API for sales prediction is vague and not a specific, integrated solution for time-series forecasting with seasonality; such APIs may not exist or may not handle custom seasonality patterns. Option B is wrong because linear regression in Vertex AI is a general-purpose model that does not inherently capture time-series dependencies like weekly and yearly seasonality without extensive feature engineering (e.g., lag features, Fourier terms). Option C is wrong because Vertex AI AutoML Tables with date as a feature treats the problem as a regression on tabular data, not as a dedicated time-series model, and may fail to properly model temporal autocorrelation and multiple seasonalities without manual time-series preprocessing.

Full explanation →

732

MCQmedium

A healthcare organization is building a machine learning model to predict patient readmission risk. They have sensitive data stored in BigQuery that includes protected health information (PHI). The data science team uses Vertex AI Workbench notebooks to explore the data and develop models. The organization's security policy requires that all PHI data must be encrypted at rest and in transit, and that access to the data is logged and audited. They also need to ensure that the data used for model training is de-identified to remove direct identifiers such as patient names and SSNs. The team wants to automate the de-identification process as part of the data pipeline. Which approach meets these requirements?

A.Create a Dataflow pipeline that reads from the original BigQuery table, applies Cloud DLP de-identification transforms, and writes to a new BigQuery table. Grant the data science team access to the de-identified table.

B.Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC to restrict data access.

C.Use Cloud Key Management Service to encrypt the PHI columns in BigQuery, and share the encryption key with the data science team.

D.Use BigQuery row-level security to mask PHI columns for the data science team, and train the model directly on the original table.

AnswerA

Dataflow with DLP automates de-identification and creates a safe dataset.

Why this answer

Option A is correct because it uses Cloud DLP within a Dataflow pipeline to automatically de-identify PHI data as it is read from the original BigQuery table and written to a new, de-identified table. This satisfies the requirement for automated de-identification, while the original table remains encrypted at rest (BigQuery default) and in transit (TLS), and access to the original data can be logged via Cloud Audit Logs. The data science team only gets access to the de-identified table, ensuring PHI is not exposed during model development.

Exam trap

Google Cloud often tests the distinction between data masking/encryption (which still exposes PHI to authorized users) and true de-identification (which removes or transforms PHI so it is no longer considered protected health information).

How to eliminate wrong answers

Option B is wrong because Shielded VM and VPC-SC provide infrastructure security (integrity, network perimeter) but do not de-identify PHI data; the data science team would still see raw PHI in the notebooks. Option C is wrong because Cloud KMS encryption protects data at rest but does not remove or mask PHI columns; sharing the encryption key with the data science team would give them access to the raw PHI, violating the de-identification requirement. Option D is wrong because BigQuery row-level security masks columns at query time but does not de-identify the underlying data; the model training would still use the original table with PHI present in the masked columns, and the masking is not a permanent de-identification suitable for an automated pipeline.

Full explanation →

733

Multi-Selectmedium

A data science team is configuring Vertex AI Model Monitoring for a deployed model. They want to detect both feature skew and feature drift. Which TWO configurations must they set?

Select 2 answers

A.Enable request/response logging.

B.Select the Jensen-Shannon divergence algorithm.

C.Configure the monitoring frequency (e.g., hourly).

D.Set the sampling rate to 1.0 (100%).

E.Specify a training dataset or statistics for baseline.

AnswersC, E

Required to define how often drift is computed.

Why this answer

To detect feature skew, they need to specify a training dataset or statistics as a reference. To detect drift over time, they need to set a monitoring frequency. Sampling rate is optional but recommended.

Full explanation →

734

MCQeasy

Refer to the exhibit. The team notices that the pipeline fails to read data from the specified Cloud Storage path. What is the most likely issue?

A.The bucket does not exist

B.The pipeline runner is incorrect

C.The region is mismatched

D.The service account lacks `storage.objectViewer` permission

AnswerD

The Dataflow service account needs read access to Cloud Storage.

Why this answer

The pipeline fails to read data from Cloud Storage because the service account lacks the `storage.objectViewer` IAM role, which grants the `storage.objects.get` and `storage.objects.list` permissions required to read objects. Without this role, the pipeline cannot authenticate or authorize the read operation, even if the bucket and path are correct.

Exam trap

Google Cloud often tests the distinction between bucket-level permissions (like `storage.objectViewer`) and project-level roles, leading candidates to overlook that the service account must have the specific IAM role on the bucket or project, not just any storage role.

How to eliminate wrong answers

Option A is wrong because if the bucket did not exist, the error would typically be a 404 'Bucket not found' or a similar explicit message, not a generic read failure. Option B is wrong because the pipeline runner (e.g., Dataflow, Apache Beam) is responsible for executing the pipeline logic, not for authenticating to Cloud Storage; a runner mismatch would cause execution errors, not permission-related read failures. Option C is wrong because Cloud Storage bucket access is global and region-mismatch errors occur only for specific operations like writing to a regional bucket from a different region, but reading is allowed across regions; a region mismatch would not block read access.

Full explanation →

735

MCQhard

Refer to the exhibit. An engineer notices no drift alerts but the model performance has degraded. What is the likely cause?

A.Feature attribution monitoring is causing too many false positives

B.Drift threshold for income is too high

C.Skew thresholds are not configured for categorical features

D.Concept drift is occurring, which is not captured by drift or skew detection

AnswerD

Concept drift affects the model's predictive relationship, not input distributions.

Why this answer

Concept drift occurs when the statistical properties of the target variable change over time, causing model performance to degrade even when the input data distribution remains stable. Drift detection (e.g., data drift or skew) monitors changes in feature distributions, not the relationship between features and the target. Since no drift alerts were triggered, the input data appears unchanged, but the model's predictive relationship has shifted — this is classic concept drift, which requires performance monitoring (e.g., accuracy, F1-score) rather than drift or skew detection.

Exam trap

Google Cloud often tests the distinction between data drift (input feature changes) and concept drift (target relationship changes), trapping candidates who assume that no drift alerts mean the model is healthy, when in fact performance degradation can occur without any feature distribution shift.

How to eliminate wrong answers

Option A is wrong because feature attribution monitoring (e.g., SHAP values) explains model predictions but does not generate false positives for drift; it is unrelated to the absence of drift alerts. Option B is wrong because a drift threshold for income being too high would suppress drift alerts for that feature, but the scenario states no drift alerts at all, and concept drift is not captured by feature-level drift thresholds. Option C is wrong because skew thresholds for categorical features detect distribution shifts in those features, but the problem is a change in the target relationship (concept drift), not a change in feature distributions.

Full explanation →

736

Multi-Selectmedium

A team wants to serve a large PyTorch model (3 GB) for online predictions with low latency. Which THREE actions should they take?

Select 3 answers

A.Use a custom container that preloads the model into memory.

B.Use batch prediction instead of online prediction.

C.Use a machine type with a GPU accelerator.

D.Optimize the model using TorchScript or quantization.

E.Deploy in multiple regions with Cloud Load Balancing.

AnswersA, C, D

Preloading avoids loading model on each request, reducing latency.

Why this answer

Options A, B, and E are correct. Option A: GPU accelerator speeds up inference. Option B: model optimization (TorchScript, quantization) reduces inference time.

Option E: custom container with model preloading reduces cold start latency. Option C (multiregion) reduces network latency, not prediction latency. Option D (batch prediction) is not for online.

Full explanation →

737

MCQeasy

An ML engineer runs this command to upload a model. The model artifact in Cloud Storage is a directory containing model.pkl and a custom preprocessing script. What will happen when he later deploys this model to an endpoint and sends a prediction request?

A.The prediction will succeed because the pre-built container automatically detects and uses the custom preprocessing script.

B.The prediction will succeed only if he also specifies a custom prediction routine.

C.The prediction will fail because the custom preprocessing script is not a standard scikit-learn serialized object.

D.The prediction will fail because the artifact URI must point to a single file not a directory.

AnswerC

The pre-built container only loads the model; custom preprocessing is not executed.

Why this answer

Option C is correct because the pre-built container for scikit-learn expects a single serialized model file (e.g., model.pkl) as the artifact. A directory containing a custom preprocessing script is not a standard scikit-learn serialized object, so the container cannot load or execute it, causing the prediction to fail.

Exam trap

Google Cloud often tests the misconception that a pre-built container can handle arbitrary directories or custom scripts, when in fact it strictly expects a single serialized model file.

How to eliminate wrong answers

Option A is wrong because the pre-built container does not automatically detect or use custom preprocessing scripts; it only loads a single model file. Option B is wrong because specifying a custom prediction routine would not fix the issue—the artifact must still be a single file, and the custom routine would need to be packaged differently (e.g., as a source distribution). Option D is wrong because the artifact URI can point to a directory; the failure is due to the directory containing a non-standard object, not because it is a directory.

Full explanation →

738

MCQmedium

A data science team wants to build a Vertex AI pipeline that trains a model, evaluates it, and conditionally deploys it if the accuracy exceeds 0.9. They want to use the Kubeflow Pipelines SDK v2. Which construct allows them to conditionally execute the deployment step based on the evaluation metric?

A.dsl.If

B.dsl.Conditional

C.dsl.ExitHandler

D.dsl.Collected

AnswerA

dsl.If is the correct construct for conditional execution in KFP v2.

Why this answer

In Kubeflow Pipelines SDK v2, `dsl.If` is the correct construct for conditionally executing pipeline steps based on runtime metrics or parameters. It allows you to define a condition that, when evaluated to true, triggers the deployment step only if the model accuracy exceeds 0.9. This is the standard way to implement branching logic in v2 pipelines.

Exam trap

Cisco often tests the distinction between `dsl.If` (the correct v2 construct) and `dsl.Conditional` (a common but incorrect name that candidates might recall from earlier SDK versions or other frameworks).

How to eliminate wrong answers

Option B is wrong because `dsl.Conditional` is not a valid construct in Kubeflow Pipelines SDK v2; the correct name is `dsl.If`. Option C is wrong because `dsl.ExitHandler` is used to execute cleanup or notification steps when a pipeline or component exits, not for conditional branching based on evaluation metrics. Option D is wrong because `dsl.Collected` is used to gather outputs from parallel tasks (e.g., from a loop or fan-out), not to conditionally execute a step.

Full explanation →

739

Multi-Selecteasy

Which TWO of the following can be used as input sources for Vertex AI batch prediction jobs? (Choose 2)

Select 2 answers

A.Cloud Firestore

B.Cloud SQL

C.BigQuery

D.Cloud Spanner

E.Cloud Storage

AnswersC, E

BigQuery is a supported input source.

Why this answer

BigQuery is a supported input source for Vertex AI batch prediction jobs because Vertex AI can directly read data from BigQuery tables for batch predictions. This integration allows you to store your prediction requests in BigQuery and have Vertex AI write the predictions back to a BigQuery output table, streamlining the workflow for large-scale predictions without needing to export data to Cloud Storage first.

Exam trap

The trap here is that candidates often assume any Google Cloud database (like Firestore, Cloud SQL, or Spanner) can serve as a direct input source for batch predictions, but Vertex AI batch prediction only supports BigQuery and Cloud Storage as input sources, requiring data to be exported or staged in those services first.

Full explanation →

740

MCQeasy

A company deploys a model on Vertex AI Prediction for real-time inference. Users report intermittent high latency during peak hours. The model is deployed on a single machine type with `min_replica_count=1` and `max_replica_count=5`. Autoscaling is enabled based on CPU utilization. What is the most likely cause of the latency spikes?

A.The model server is crashing under load due to memory issues.

B.Autoscaling based on CPU utilization does not react quickly to inference request spikes.

C.The load balancer is misconfigured and routes traffic unevenly.

D.The container image is not optimized for the model.

AnswerB

CPU utilization may lag behind request surges; Vertex AI recommends using target utilization or custom metrics for faster response.

Why this answer

Option B is correct because CPU utilization may not be a good proxy for inference load; the system may not scale up fast enough under sudden traffic bursts. Option A is wrong because Vertex AI automatically manages container health. Option C is wrong because Vertex AI endpoints automatically distribute traffic.

Option D is wrong because the container image is built correctly.

Full explanation →

741

Multi-Selectmedium

A company wants to use Vertex AI Vector Search for real-time product recommendations based on user embeddings. They need to update the index frequently with new product embeddings without significant downtime. Which TWO options should they consider? (Choose 2)

Select 2 answers

A.Maintain two separate indexes: one for building and one for serving, and swap them after building.

B.Use streaming updates to add new embeddings incrementally.

C.Use batch updates to rebuild the entire index daily.

D.Increase the number of replicas on the deployed index.

E.Use a GPU-powered endpoint for faster indexing.

AnswersA, B

This allows building a new index offline and swapping with minimal downtime.

Why this answer

Option A is correct because maintaining two separate indexes allows you to build a new index in the background while the existing index continues to serve queries. Once the new index is fully built and validated, you can swap it into production with minimal downtime. This is a common pattern for high-availability systems where index rebuilds are necessary.

Exam trap

Cisco often tests the distinction between scaling serving capacity (replicas) and updating index content, leading candidates to mistakenly choose D or E when the real need is for a zero-downtime update strategy.

Full explanation →

742

MCQeasy

What is the primary benefit of using pipeline caching in Vertex AI Pipelines?

A.It reduces execution time and cost by reusing unchanged component outputs.

B.It encrypts data at rest.

C.It automatically scales the pipeline resources.

D.It enables parallel execution of components.

AnswerA

This is the primary benefit of caching.

Why this answer

Pipeline caching in Vertex AI Pipelines automatically detects when a component's inputs and code have not changed from a previous execution and reuses the cached output artifacts. This avoids redundant computation, directly reducing both execution time and cost by skipping re-execution of unchanged steps.

Exam trap

Cisco often tests the distinction between caching (reusing outputs) and parallelization (running components concurrently), so candidates may confuse the two and incorrectly select parallel execution as the primary benefit.

How to eliminate wrong answers

Option B is wrong because encryption at rest is a data security feature managed by Cloud KMS or default Google Cloud encryption, not a benefit of pipeline caching. Option C is wrong because automatic scaling of pipeline resources is handled by Vertex AI's underlying infrastructure (e.g., node auto-scaling) or custom configuration, not by caching. Option D is wrong because parallel execution of components is achieved through pipeline design (e.g., using `dsl.ParallelFor` or independent component dependencies), not through caching; caching can actually reduce the need for parallel execution by reusing results.

Full explanation →

743

MCQmedium

An ML team is building a feature pipeline with Dataflow that reads from BigQuery, computes features, and writes to Vertex AI Feature Store. They need to ensure that features are available for both training and serving with low latency. Which Feature Store option should they use?

A.Create a featurestore with only offline serving

B.Store features directly in BigQuery

C.Use Cloud SQL as a feature store

D.Create a featurestore with online serving enabled

AnswerD

Online serving provides low-latency access via Bigtable.

Why this answer

Vertex AI Feature Store supports online serving (low latency) and offline serving (batch). For low latency, they must enable online serving. The online store uses Bigtable for low-latency lookups.

Full explanation →

744

MCQeasy

A company has deployed a computer vision model on Vertex AI Prediction using a custom container. The model processes high-resolution images and serves predictions to a mobile application. Recently, users have reported that predictions sometimes take over 10 seconds, and the application times out. The ML engineer's monitoring shows that the endpoint's CPU utilization is consistently high (above 85%) and that the request latency spikes during peak hours. The model is deployed on n1-standard-4 machines with automatic scaling set to minReplicaCount=1 and maxReplicaCount=5. The engineer has observed that the endpoint rarely scales beyond 2 replicas even during peak hours. What should the engineer do to reduce prediction latency?

A.Increase the maxReplicaCount to 20 to allow more instances during spikes.

B.Review the custom container's startup time and consider pre-warming or reducing model loading time.

C.Change the machine type to a higher CPU machine like n1-standard-8.

D.Set the minReplicaCount to 5 to ensure enough capacity at all times.

AnswerB

The endpoint rarely scales beyond 2 replicas due to slow container startup, causing CPU overload on existing instances. Reducing startup time or pre-warming enables faster scaling and lower latency.

Why this answer

Option D is correct because the root cause is likely that the custom container takes a long time to start (model loading), preventing the endpoint from scaling quickly. Pre-warming or reducing model loading time addresses this directly. Option A (increasing max replicas) does not solve the scaling delay.

Option B (upgrading machine type) may help but does not address the scaling speed. Option C (increasing min replicas) would be costly and still not handle sudden spikes if new replicas start slowly.

Full explanation →

745

MCQmedium

A manufacturing company wants to predict equipment failure using sensor data stored in BigQuery. They have limited ML expertise and want to use AutoML Tables. The data includes timestamps, numerical sensor readings, and a boolean 'failure' column. The dataset is highly imbalanced with only 1% failure cases. Which of the following is the most effective approach to handle the imbalance in AutoML Tables?

A.Let AutoML Tables handle the imbalance automatically; it has built-in techniques for class imbalance.

B.Downsample the majority class to balance the dataset.

C.Use a custom loss function in the training configuration.

D.Oversample the minority class using SQL before training.

AnswerA

AutoML Tables automatically adjusts for imbalance.

Why this answer

AutoML Tables has built-in techniques to handle class imbalance, such as automatically adjusting class weights and using stratified sampling during training. This allows the model to learn from the minority class without requiring manual data preprocessing, making it the most effective and simplest approach for users with limited ML expertise.

Exam trap

The trap here is that candidates may assume manual resampling (downsampling or oversampling) is always required for imbalanced datasets, but AutoML Tables abstracts this complexity, and the exam tests whether you trust its built-in capabilities for low-code solutions.

How to eliminate wrong answers

Option B is wrong because downsampling the majority class would discard valuable data, potentially reducing model performance and losing information about normal operating conditions. Option C is wrong because AutoML Tables does not expose a custom loss function configuration; it abstracts away such hyperparameters and uses its own optimized training pipeline. Option D is wrong because oversampling the minority class using SQL before training is unnecessary and could lead to overfitting or data leakage; AutoML Tables handles imbalance internally without manual intervention.

Full explanation →

746

MCQhard

A company wants to use low-code ML for time series forecasting with 5 years of hourly data. They need to incorporate holiday effects. Which solution best meets these requirements?

A.Custom LSTM model

B.BigQuery ML ARIMA_PLUS with holiday regression

C.Vertex AI AutoML Tables with timestamp and holiday features

D.Vertex AI AutoML Forecasting with timestamp and holiday feature

AnswerB

ARIMA_PLUS directly supports holiday effects in its model.

Why this answer

BigQuery ML ARIMA_PLUS with holiday regression is the correct choice because it is a low-code solution that natively supports time series forecasting with built-in holiday effect modeling. ARIMA_PLUS automatically handles seasonality, trend, and holiday regression without requiring custom code, making it ideal for 5 years of hourly data.

Exam trap

Google Cloud often tests the distinction between AutoML Forecasting and BigQuery ML ARIMA_PLUS, where candidates mistakenly assume AutoML Forecasting natively handles holiday regression, but it requires explicit feature engineering, while ARIMA_PLUS provides built-in holiday support.

How to eliminate wrong answers

Option A is wrong because a custom LSTM model requires significant coding and ML expertise, violating the low-code requirement. Option C is wrong because Vertex AI AutoML Tables is designed for tabular data and does not natively support time series forecasting with holiday effects; it would require manual feature engineering. Option D is wrong because Vertex AI AutoML Forecasting does not natively incorporate holiday regression; it focuses on time series features but lacks built-in holiday effect handling, requiring additional preprocessing.

Full explanation →

747

MCQhard

Your team deploys a multi-model endpoint on Vertex AI with two models: Model A (small, low latency) and Model B (large, high latency). You configure traffic splitting so that 90% goes to Model A and 10% to Model B. However, you notice that the latency for Model A increases when Model B receives traffic. What is the most likely cause?

A.Model A is being overloaded because autoscaling is based on aggregate traffic.

B.The traffic split is misconfigured, causing requests to be routed incorrectly.

C.The models are collocated on the same instances, leading to resource contention.

D.Model B's logging is generating too much output, slowing down the predictor.

AnswerC

Multi-model endpoints share replicas; Model B's work impacts Model A.

Why this answer

In a multi-model endpoint, all models share the underlying infrastructure. When Model B handles requests, it consumes resources (CPU/memory), causing contention that degrades Model A's latency. Collocation of models on the same instance is the issue.

Full explanation →

748

MCQhard

An engineer wants to use BigQuery ML to explain predictions from a trained boosted tree classifier for a specific set of input rows. Which function should they use?

A.ML.EVALUATE

B.ML.FEATURE_IMPORTANCE

C.ML.PREDICT

D.ML.EXPLAIN_PREDICT

AnswerD

Why this answer

ML.EXPLAIN_PREDICT provides feature attributions for each prediction. ML.EVALUATE returns aggregate metrics, ML.FEATURE_IMPORTANCE gives global importance, and ML.PREDICT only returns predictions.

Full explanation →

749

MCQeasy

A startup is deploying its first machine learning model using BigQuery ML. The model is a logistic regression for churn prediction, trained on a dataset of 5 million rows. The pipeline runs every week: it exports training data from BigQuery, trains a model using BigQuery ML, and then deploys the model as a remote model for predictions. The ML engineer wants to set up basic monitoring to ensure the pipeline runs successfully and the model quality does not degrade. Which monitoring approach should the engineer implement first?

A.Set up Cloud Monitoring alerts on the pipeline's execution status and duration, and create a simple dashboard showing these metrics.

B.Export BigQuery audit logs to Cloud Logging and analyze them for any errors.

C.Enable Vertex AI Model Monitoring to detect data drift between training and serving data.

D.Monitor the model's area under the ROC curve (AUC) over time and alert if it drops by more than 0.01.

AnswerA

Fundamental monitoring ensures pipeline runs successfully.

Why this answer

Option A is correct because the first priority in monitoring a new ML pipeline is ensuring it runs successfully and on time. Cloud Monitoring alerts on execution status and duration directly address pipeline reliability, which is the most basic operational concern before model quality metrics like AUC or drift can be meaningful. This approach aligns with the principle of starting with infrastructure health before advanced model monitoring.

Exam trap

Google Cloud often tests the principle of 'start with the basics' — candidates are tempted to jump to advanced monitoring like drift or AUC, but the correct first step is ensuring the pipeline runs reliably.

How to eliminate wrong answers

Option B is wrong because exporting BigQuery audit logs to Cloud Logging and analyzing them for errors is a reactive, post-hoc approach that does not provide real-time pipeline monitoring; it also adds complexity without addressing the immediate need for basic pipeline health checks. Option C is wrong because Vertex AI Model Monitoring for data drift is an advanced monitoring technique that requires a stable serving environment and baseline data, which is premature for a first deployment; it also incurs additional cost and setup time. Option D is wrong because monitoring AUC over time and alerting on a drop of 0.01 assumes the model is already in production with a baseline, but the question asks for the first monitoring step, which should be pipeline execution success, not model performance degradation.

Full explanation →

750

Multi-Selectmedium

You are designing a distributed training job for a very large neural network that does not fit on a single machine. You need to split the model across multiple devices. Which TWO techniques can you use?

Select 2 answers

A.ParameterServerStrategy

B.Pipeline parallelism

C.Operator-level model parallelism

D.Data parallelism with MirroredStrategy

E.MultiWorkerMirroredStrategy

AnswersB, C

Splits model into stages across devices.

Why this answer

Pipeline parallelism splits layers across devices; model parallelism (operator-level) splits individual operations. Data parallelism replicates the model. ParameterServer is a form of distributed training, not model splitting.

Full explanation →

Page 10 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Scaling Prototypes into ML Models Collaborating to manage data and models Solving business challenges with ML

See all domains with question counts →