Google Professional Machine Learning Engineer (PMLE) — Questions 751825

1000 questions total · 14pages · All types, answers revealed

Page 10

Page 11 of 14

Page 12
751
MCQmedium

A company wants to be alerted when the prediction error rate on their Vertex AI Endpoint exceeds 5% in any 5-minute window. What is the best way to set up this alert?

A.Create a Cloud Monitoring alert on the metric 'prediction/error_count' with a threshold of 5% of total predictions
B.Set up a scheduled Cloud Function to query logs and check error rate
C.Enable BigQuery log sink and create a scheduled query alert
D.Use Vertex AI Model Monitoring to detect prediction drift
AnswerA

Correct: Cloud Monitoring supports alerting on error rate metrics.

Why this answer

Cloud Monitoring can create an alerting policy based on the 'Prediction error count' metric with a rolling 5-minute window and a threshold of 5% (or absolute count).

752
MCQmedium

A data science team wants to deploy a ML pipeline on Vertex AI Pipelines that includes a component to train a model using a custom container. The component should be reusable across different pipelines and accept hyperparameters as inputs. Which approach should they take?

A.Create a container component by specifying a container image and input/output artifacts using the Kubeflow Pipelines SDK.
B.Package the training code as a Vertex AI Training custom job and call it from a Python function component.
C.Define a Python function component using @dsl.component and pass hyperparameters as function arguments.
D.Use a pre-built Google Cloud Pipeline Component for custom training and override the image.
AnswerA

Container components allow custom containers and are reusable across pipelines.

Why this answer

Container components are designed for custom containers and can be defined once and reused across pipelines. Python function components are simpler but limited to Python code; they cannot easily encapsulate custom containers.

753
MCQhard

Your model serving endpoint on Vertex AI is experiencing increased memory usage after a recent update. The model was converted from TensorFlow to TF Lite for faster inference. You notice that the endpoint's instances occasionally get killed due to out-of-memory (OOM) errors. What is the most likely cause?

A.The TF Lite model is larger in size than the original model.
B.The Vertex AI endpoint is not configured with enough CPU.
C.The number of inference threads in the TF Lite runtime is set too high, causing memory consumption.
D.The traffic to the endpoint has increased significantly.
AnswerC

TF Lite can use multiple threads; excessive threads increase memory.

Why this answer

TF Lite models can have different memory footprint depending on the number of threads used for inference. If the custom container or the runtime allocates many threads, memory usage can spike. The model conversion itself may not reduce memory; thread count is a key factor.

754
Multi-Selectmedium

You are optimizing a model for deployment on Vertex AI using NVIDIA Triton Inference Server. Which TWO actions can you take to improve inference performance?

Select 2 answers
A.Increase the number of model replicas to the maximum.
B.Use TensorRT to quantize the model to FP16 or INT8.
C.Disable model caching to reduce memory usage.
D.Enable dynamic batching in Triton to aggregate requests.
E.Use a larger machine type with more vCPUs.
AnswersB, D

Quantization reduces model size and speeds up inference with minimal accuracy loss.

Why this answer

Option B is correct because TensorRT optimizes model inference by quantizing weights and activations to lower precision formats like FP16 or INT8, reducing memory bandwidth and computation time without significant accuracy loss. This is a standard technique for improving throughput on NVIDIA GPUs, especially when deploying with Triton Inference Server, which natively supports TensorRT-optimized model repositories.

Exam trap

Cisco often tests the misconception that simply adding more replicas or CPU resources will linearly improve inference performance, ignoring the GPU-bound nature of model serving and the importance of batching and precision optimization.

755
MCQmedium

A data science team uses Vertex AI Experiments to track training runs. They want to automatically log parameters, metrics, and artifacts for all runs with minimal code changes. Which approach should they take?

A.Manually log each parameter and metric using `aiplatform.log_metrics()` after each training step.
B.Use MLflow autologging by calling `mlflow.autolog()` before the training code and wrap the training script with `mlflow.start_run()`.
C.Enable Vertex AI Experiments autologging by setting `autolog=True` in the experiment run context.
D.Use TensorBoard with tf.keras.callbacks.TensorBoard to log metrics.
AnswerB

MLflow autologging captures parameters, metrics, and artifacts automatically when used with Vertex AI Experiments.

Why this answer

Vertex AI Experiments supports autologging via the MLflow library. By wrapping the training code with mlflow.start_run() and enabling autolog, all parameters, metrics, and artifacts are captured automatically.

756
MCQeasy

You are defining a Python function component in KFP SDK v2. Which decorator should you use?

A.@dsl.task
B.@component
C.@dsl.pipeline
D.@dsl.component
AnswerD

Correct decorator for a component in KFP SDK v2.

Why this answer

In KFP SDK v2, the `@dsl.component` decorator is used to define a Python function as a lightweight, reusable pipeline component that can be executed independently. This decorator automatically generates a containerized component from the function's signature and type annotations, enabling type-safe inputs and outputs without requiring a separate component YAML specification.

Exam trap

Cisco often tests the distinction between v1 and v2 decorators, so the trap here is that candidates familiar with KFP SDK v1 may incorrectly choose `@component` (option B) instead of the v2-specific `@dsl.component`.

How to eliminate wrong answers

Option A is wrong because `@dsl.task` is not a valid decorator in KFP SDK v2; tasks are implicitly created when a component is called within a pipeline, not via a decorator. Option B is wrong because `@component` is the decorator from KFP SDK v1 (the older `kfp.components` module) and is not used in v2, which requires the `dsl` namespace. Option C is wrong because `@dsl.pipeline` is used to define a pipeline (a DAG of components), not a single component function.

757
MCQmedium

An engineer is training a model on Vertex AI using a custom container. The training job fails with an error indicating that the container exited with a non-zero status. The engineer wants to debug the issue. What is the best way to access the logs?

A.SSH into the training container using Vertex AI's SSH feature
B.View logs in Cloud Storage under the job's output directory
C.Use Cloud Debugger to inspect the container
D.Check the logs in Cloud Logging (Logs Explorer)
AnswerD

Logs are automatically streamed to Cloud Logging; this is the standard debugging approach.

Why this answer

Option D is correct because Vertex AI automatically streams all container stdout and stderr to Cloud Logging (Logs Explorer). When a custom container exits with a non-zero status, the detailed error messages, stack traces, and application logs are captured there, making it the primary and most comprehensive debugging tool. Cloud Logging provides structured, searchable logs without requiring direct access to the container.

Exam trap

Cisco often tests the misconception that you can SSH into a training container or that logs are stored in Cloud Storage, when in fact Cloud Logging is the centralized, default logging solution for all Vertex AI training jobs.

How to eliminate wrong answers

Option A is wrong because Vertex AI does not provide an SSH feature for training containers; training jobs run in ephemeral, isolated environments with no interactive shell access. Option B is wrong because Cloud Storage under the job's output directory stores artifacts like model checkpoints and metrics, not real-time container logs or error messages. Option C is wrong because Cloud Debugger is designed for debugging running applications in production by capturing snapshots and variable states, not for inspecting container exit errors or retrieving logs from a failed training job.

758
Drag & Dropmedium

Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4

Why this order

Deploy first, then enable monitoring, set thresholds, configure alerts, and review.

759
MCQmedium

A team develops a pipeline that trains a model and evaluates it. They want to pass the test accuracy (a float) from the evaluation component to a subsequent deployment component. Which KFP SDK type should the evaluation component output be annotated with?

A.Output[float]
B.Output[Metrics]
C.Output[Artifact]
D.Output[ClassificationMetrics]
AnswerA

Output[float] defines a pipeline parameter that can be used as input to downstream components.

Why this answer

For simple numeric values like floats, the appropriate output type is a pipeline parameter (e.g., float). KFP artifacts are used for larger data like models or datasets, not for scalar metrics.

760
Drag & Dropmedium

Drag and drop the steps to perform a hyperparameter tuning job on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4

Why this order

Define the search space, then create and run the tuning job, monitor, and select the best parameters.

761
MCQmedium

A team is monitoring a model on Vertex AI Endpoints and wants to track the p99 latency of online predictions. Which approach should they use to set up latency monitoring and alerting?

A.Enable Vertex AI Model Monitoring and select 'latency' as a metric
B.Enable Vertex AI Explainable AI to output latency statistics
C.Configure Cloud Monitoring to scrape Prometheus metrics from the endpoint
D.Use Cloud Logging to create log-based metrics from prediction logs and set up alerts in Cloud Monitoring
AnswerD

Prediction logs contain latency information; log-based metrics can capture p99 and other percentiles.

Why this answer

Vertex AI Endpoints automatically export request/response logs to Cloud Logging, which can be used to create log-based metrics for latency percentiles. These metrics can then be visualized in Cloud Monitoring dashboards and used for alerting.

762
MCQmedium

A company is deploying a deep learning model on edge devices with limited storage and computational resources. They need to reduce the model size by 80% while maintaining acceptable accuracy. Which two techniques should they combine?

A.Pruning and distillation
B.Quantization and distillation
C.Quantization and pruning

Why this answer

Quantization (e.g., post-training quantization to INT8) reduces model size and speeds up inference. Pruning removes redundant weights. Knowledge distillation trains a smaller student model.

For 80% reduction, quantization alone might not be enough; combining quantization with pruning or distillation can achieve such compression.

763
Multi-Selecthard

Which TWO factors should you consider when choosing between BigQuery and Cloud Storage for storing training data? (Choose 2)

Select 2 answers
A.The format of the data: structured vs. unstructured.
B.The need for SQL-based transformations and analysis on the data.
C.The requirement for data encryption at rest.
D.The need for fine-grained access control at the row level.
E.The maximum size of the dataset (BigQuery limit 1 TB).
AnswersA, B

Correct: Cloud Storage is better for unstructured data.

Why this answer

Option A is correct because BigQuery is optimized for structured, tabular data (e.g., CSV, Avro, Parquet) and supports SQL queries, while Cloud Storage is a better fit for unstructured data (e.g., images, videos, raw text files) that does not require schema enforcement. Choosing the right storage depends on whether the training data has a fixed schema and requires relational querying or is blob-based and needs high-throughput access.

Exam trap

Google Cloud often tests the misconception that BigQuery has a hard 1 TB storage limit, when in reality the limit is much higher (default 10 TB for free, and no hard cap for paid tiers), leading candidates to incorrectly choose option E.

764
MCQhard

An ML engineer trained a model and registered it in Vertex AI Model Registry. They want to assign the alias 'champion' to the best-performing version for production deployment. Which gcloud command should they use?

A.gcloud ai models versions describe --model=MODEL_ID --version=VERSION_ID
B.gcloud ai models upload --model-id=MODEL_ID --display-name=champion
C.gcloud ai endpoints deploy-model --model=MODEL_ID --alias=champion
D.gcloud ai models versions update --model=MODEL_ID --version=VERSION_ID --update-aliases=champion
AnswerD

This updates the version's aliases.

Why this answer

The correct command is 'gcloud ai models update' with the --update-aliases flag to add or remove aliases.

765
Multi-Selecteasy

A company wants to use Vertex AI for hyperparameter tuning. Which three components are required to configure a hyperparameter tuning job? (Choose THREE.)

Select 3 answers
A.Algorithm (e.g., Bayesian, grid, random)
B.Machine type for each trial
C.List of hyperparameters with types and ranges
D.Objective metric name and goal (minimize or maximize)
E.Training container image
AnswersA, C, D

Required to specify how to search.

Why this answer

Option A is correct because Vertex AI hyperparameter tuning requires specifying the search algorithm (Bayesian, grid, or random) to determine how the hyperparameter space is explored. Bayesian optimization is the default and most efficient for continuous spaces, while grid search is exhaustive and random search is simple. Without an algorithm, Vertex AI cannot decide how to sample trials.

Exam trap

Cisco often tests the distinction between required hyperparameter tuning job components and optional training job settings, leading candidates to mistakenly include machine type or container image as mandatory tuning parameters.

766
MCQmedium

You have a TensorFlow model that you want to deploy on edge devices for real-time inference. The model was trained in Vertex AI. You need to convert it to a format suitable for on-device inference. Which approach should you use?

A.Export the model as a serialized TFX pipeline.
B.Export the model to a SavedModel and deploy it using Vertex AI Edge Manager.
C.Convert the model to TensorFlow Lite using the TensorFlow Lite converter.
D.Use Vertex AI Model Optimization to compile the model for edge devices.
AnswerC

TensorFlow Lite is optimized for on-device inference with low latency.

Why this answer

Option C is correct because TensorFlow Lite is specifically designed for on-device inference on edge devices, offering optimized performance and reduced model size. The TensorFlow Lite converter transforms a TensorFlow model (e.g., from a SavedModel) into the FlatBuffer format (.tflite), which is lightweight and compatible with mobile and embedded platforms. This directly addresses the requirement for real-time inference on edge devices.

Exam trap

Cisco often tests the misconception that Vertex AI services like Edge Manager or Model Optimization directly produce a deployable edge format, when in fact TensorFlow Lite conversion is the required final step for on-device inference.

How to eliminate wrong answers

Option A is wrong because a TFX pipeline is a production ML workflow framework for orchestrating training, validation, and deployment, not a model format for on-device inference; serializing it does not produce a deployable edge model. Option B is wrong because Vertex AI Edge Manager is a service for managing and deploying models to edge devices, but it expects models in a compatible format like TensorFlow Lite; exporting to SavedModel alone is insufficient without conversion, and Edge Manager itself does not perform the conversion. Option D is wrong because Vertex AI Model Optimization focuses on techniques like pruning and quantization to improve model efficiency, but it does not compile the model into a format suitable for on-device inference; the output still requires conversion to TensorFlow Lite for edge deployment.

767
Multi-Selecthard

A company is building a document processing pipeline for invoices. They need to extract key fields (invoice number, date, total amount) and allow human review for invoices over $10,000. Which TWO Google Cloud services/features should they combine?

Select 2 answers
A.Cloud Vision API for OCR
B.Human-in-the-Loop (HITL) on Document AI
C.AutoML Tables to predict missing fields
D.Document AI with invoice parser processor
E.Cloud Translation API to translate invoices
AnswersB, D

Why this answer

Document AI with a specialized invoice parser can extract fields. Human-in-the-Loop (HITL) is a feature of Document AI that enables human review. Cloud Vision API is not specialized for invoices.

AutoML Tables is for tabular models. Cloud Translation is not needed.

768
MCQeasy

You need to serve multiple models on a single Vertex AI endpoint to reduce costs. How can you achieve this?

A.Use Cloud Run to serve each model separately.
B.Use Vertex AI Prediction with multi-model serving by deploying multiple models to one endpoint with traffic splits.
C.Package all models into a single container and deploy that container.
D.Deploy each model to its own endpoint and use a load balancer.
AnswerB

Multiple models can be deployed to a single endpoint, each receiving a portion of the traffic.

Why this answer

Option B is correct because Vertex AI Prediction supports multi-model serving, allowing you to deploy multiple models to a single endpoint and use traffic splits to route a percentage of requests to each model. This reduces costs by sharing underlying infrastructure (e.g., compute resources) across models, rather than provisioning separate endpoints or containers for each model.

Exam trap

The trap here is that candidates often confuse multi-model serving with containerization, assuming that bundling models into a single container (Option C) is equivalent to Vertex AI's native multi-model support, but this ignores the need for traffic splitting and independent model lifecycle management.

How to eliminate wrong answers

Option A is wrong because Cloud Run serves each model as a separate service, which does not consolidate models onto a single endpoint and incurs additional costs for individual scaling and networking. Option C is wrong because packaging all models into a single container violates the principle of model isolation, complicates updates, and does not leverage Vertex AI's native traffic-splitting mechanism for granular control. Option D is wrong because deploying each model to its own endpoint and using a load balancer increases operational overhead and cost, as each endpoint requires separate compute resources, defeating the purpose of cost reduction.

769
MCQmedium

A machine learning team has a prototype using a custom TensorFlow model trained on a small dataset stored in Cloud Storage. They want to scale the prototype to production with minimal code changes while ensuring the model can handle increased traffic and new data. The model currently loads data using tf.data.Dataset from CSV files. Which approach best meets these requirements?

A.Use Vertex AI Training with hyperparameter tuning and distributed training, then deploy the model to Vertex AI Prediction with autoscaling.
B.Deploy the model to AI Platform (Unified) Prediction with a custom container, and use AI Platform Training to retrain on larger datasets.
C.Migrate the model to BigQuery ML and use SQL for training and prediction to leverage BigQuery's scalability.
D.Package the model as a Cloud Run Function and use Cloud Scheduler to trigger retraining periodically.
AnswerA

Vertex AI provides seamless scaling with minimal code changes and supports tf.data.Dataset.

Why this answer

Vertex AI Prediction with autoscaling directly addresses the need to handle increased traffic without code changes, while Vertex AI Training with hyperparameter tuning and distributed training enables scaling to larger datasets with minimal modifications to the existing tf.data pipeline. This approach keeps the custom TensorFlow model intact and leverages managed infrastructure for both training and serving.

Exam trap

The trap here is that candidates may overcomplicate by choosing containerization (B) or a completely different platform (C), missing that Vertex AI Prediction natively supports TensorFlow models with autoscaling and minimal code changes.

How to eliminate wrong answers

Option B is wrong because it suggests using AI Platform (Unified) Prediction with a custom container, which is unnecessary and adds complexity; the existing model can be deployed directly without containerization, and the requirement is minimal code changes. Option C is wrong because migrating to BigQuery ML would require rewriting the model logic from TensorFlow to SQL, which is a significant code change and not suitable for a custom TensorFlow model. Option D is wrong because Cloud Run Functions are stateless and not designed for serving ML models with autoscaling for prediction traffic; Cloud Scheduler for retraining does not address the need for handling increased traffic or new data in a production serving path.

770
MCQeasy

A data scientist wants to deploy a trained TensorFlow model to Vertex AI for online predictions. They need to serve predictions with low latency and want to leverage GPU acceleration. Which machine type should they select when creating the Vertex AI endpoint?

A.n1-standard-4 with 1 NVIDIA Tesla T4
B.n1-standard-4
C.e2-standard-4
D.n1-highmem-8
AnswerA

Attaching a GPU to an n1-standard machine enables GPU acceleration.

Why this answer

Option A is correct because the n1-standard-4 machine type supports attaching GPUs such as the NVIDIA Tesla T4, which provides GPU acceleration for low-latency online predictions. Vertex AI endpoints require a machine type that allows GPU attachment, and the n1-series is one of the few families that supports GPUs, while the T4 offers a good balance of cost and performance for inference workloads.

Exam trap

The trap here is that candidates may assume any machine type can be paired with a GPU, but only specific series (like n1, n2, g2) support GPU attachment, and the e2 series explicitly does not, leading to a wrong selection if the GPU requirement is overlooked.

How to eliminate wrong answers

Option B is wrong because n1-standard-4 without a GPU does not provide GPU acceleration, so it cannot meet the requirement for low-latency predictions with GPU. Option C is wrong because e2-standard-4 does not support attaching GPUs at all; the e2 series is designed for cost-optimized CPU-only workloads. Option D is wrong because n1-highmem-8, while it can support GPUs, is over-provisioned in memory for typical inference tasks and does not include a GPU by default, so it would not satisfy the explicit need for GPU acceleration unless a GPU is attached, but the option as stated lacks the GPU specification.

771
MCQeasy

A company is serving a model for their e-commerce website. They expect traffic to be low at night and very high during flash sales. They want to minimize costs while ensuring availability during spikes. Which autoscaling configuration should they use?

A.min_replica_count=5, max_replica_count=5, target_cpu=60
B.min_replica_count=1, max_replica_count=20, target_cpu=60
C.min_replica_count=10, max_replica_count=10, target_cpu=60
D.min_replica_count=0, max_replica_count=100, target_cpu=80
AnswerB

Scales from 1 to 20 based on load, cost-efficient.

Why this answer

Setting a high max_replica_count allows scaling to handle spikes, while a low min_replica_count saves cost during low traffic. CPU utilization target of 60% is reasonable.

772
MCQeasy

A company has a prototype ML model that works well on historical data, but when deployed to production, the model performance degrades over time. The data distribution shifts gradually. Which strategy should they implement to maintain model accuracy?

A.Increase the regularization strength to prevent overfitting.
B.Increase the amount of training data by using more historical records.
C.Implement a retraining pipeline that periodically retrains the model on recent data.
D.Switch to a more complex model architecture to better capture patterns.
AnswerC

Periodic retraining with fresh data helps the model adapt to gradual distribution shifts.

Why this answer

Option C is correct because gradual data distribution shifts (concept drift) require the model to adapt to new patterns over time. A retraining pipeline that periodically retrains on recent data ensures the model remains aligned with the current production distribution, directly addressing the degradation caused by drift without relying on static historical data.

Exam trap

Google Cloud often tests the misconception that overfitting or model complexity is the primary cause of production degradation, leading candidates to choose regularization or more complex architectures instead of recognizing that distribution shift requires data freshness.

How to eliminate wrong answers

Option A is wrong because increasing regularization strength reduces overfitting to historical noise but does not address the root cause—distribution shift—and may actually harm performance on new data by forcing the model to ignore legitimate new patterns. Option B is wrong because adding more historical records only reinforces the old distribution, making the model less responsive to recent shifts and potentially worsening drift. Option D is wrong because switching to a more complex model architecture increases capacity to fit data but does not solve the problem of stale training distribution; it may even overfit to outdated patterns and degrade faster under drift.

773
MCQeasy

A data analyst wants to train a linear regression model to predict house prices using only SQL queries on BigQuery. Which BigQuery ML model type should they use?

A.BOOSTED_TREE_REGRESSOR
B.LOGISTIC_REG
C.LINEAR_REG
D.DNN_REGRESSOR
AnswerC

Why this answer

The question specifies a linear regression model for predicting house prices, which is a regression task with a continuous target variable. BigQuery ML's LINEAR_REG model type is explicitly designed for linear regression, making it the correct choice for this use case.

Exam trap

Cisco often tests the distinction between regression and classification model types, and the trap here is that candidates might confuse LOGISTIC_REG (classification) with linear regression due to the word 'logistic' sounding similar to 'linear', or they might overcomplicate the solution by choosing a tree or neural network model when a simple linear model suffices.

How to eliminate wrong answers

Option A is wrong because BOOSTED_TREE_REGRESSOR is a tree-based ensemble method, not a linear model, and is overkill for a simple linear regression task. Option B is wrong because LOGISTIC_REG is used for binary classification, not regression (predicting continuous values like house prices). Option D is wrong because DNN_REGRESSOR is a deep neural network regressor, which is unnecessarily complex and not a linear model.

774
MCQmedium

A developer needs to transcribe phone calls with high accuracy for a call center analytics application. The audio is in English and has background noise. Which Speech-to-Text model should they choose?

A.telephony
B.latest_short
C.latest_long
D.Any model; they are all equivalent
AnswerA

Why this answer

The telephony model is optimized for phone call audio with background noise. latest_long is for longer audio (without phone-specific optimization), and latest_short is for short utterances.

775
MCQeasy

A company has deployed a fraud detection model on Vertex AI Prediction. After three months, the model's accuracy has degraded, and the business is losing money due to undetected fraud. What should the team implement to proactively detect such issues?

A.Enable Vertex AI Model Monitoring to track prediction drift and alert when metrics exceed thresholds.
B.Set up Cloud Logging to capture all prediction requests and responses for manual review.
C.Randomly shuffle the training data before retraining to improve robustness.
D.Schedule a monthly job to retrain the model with the latest data without monitoring.
AnswerA

Model Monitoring automatically analyzes input distributions and prediction quality over time.

Why this answer

Option B is correct because monitoring prediction drift is a key practice for model quality. Option A is wrong because logs don't automatically detect drift. Option C is wrong because model monitoring helps, but retraining alone doesn't detect.

Option D is wrong because shuffling data doesn't address drift.

776
Multi-Selecthard

A data science team wants to monitor model quality by comparing predictions against ground truth labels. They have deployed a model on Vertex AI Endpoints and enable request/response logging to BigQuery. Which THREE actions should they take to set up model quality monitoring? (Choose 3)

Select 3 answers
A.Create a scheduled query to compute metrics like accuracy and confusion matrix over time
B.Join prediction logs with ground truth labels on a common key (e.g., request ID)
C.Configure Vertex AI Model Monitoring to detect prediction drift
D.Use Vertex AI Explainability to compute feature attributions
E.Upload ground truth labels to a BigQuery table
AnswersA, B, E

Correct: Scheduled queries automate metric computation.

Why this answer

To monitor model quality, the team needs to upload ground truth labels to BigQuery, join with prediction logs, compute metrics (e.g., accuracy, confusion matrix) over time, and optionally create dashboards.

777
MCQhard

A company uses Vertex AI Pipelines to orchestrate ML workflows. After a pipeline run, they want to query the lineage of a particular model artifact to find out which dataset and hyperparameters were used to produce it. Which API method should they use?

A.projects.locations.metadataStores.artifacts.queryArtifactLineageSubgraph
B.projects.locations.metadataStores.artifacts.get
C.projects.locations.metadataStores.contexts.addContextArtifactsAndExecutions
D.projects.locations.metadataStores.executions.queryExecutionInputsAndOutputs
AnswerA

Correct method to get lineage subgraph.

Why this answer

Vertex AI Metadata allows querying lineage via the lineagesubgraph method to retrieve upstream and downstream artifacts.

778
MCQmedium

You have a Vertex AI endpoint with autoscaling enabled. You notice that during traffic spikes, the endpoint takes a long time to scale up, causing prediction errors. What is the most effective solution?

A.Use a larger machine type to handle more requests per replica.
B.Reduce the target CPU utilization to trigger scaling earlier.
C.Increase the maximum number of replicas.
D.Increase the minimum number of replicas to maintain a larger buffer.
AnswerD

Higher min replicas reduce the time to scale up during spikes.

Why this answer

Setting higher min replicas ensures a baseline of compute always available to absorb traffic spikes, reducing cold start latency. Increasing max replicas allows scaling higher but does not address the initial delay.

779
MCQeasy

You need to run a custom training job on Vertex AI using a pre-built container for scikit-learn. Which container image should you specify?

A.us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-9
B.us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0.23-0
C.us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.1-3
D.us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6
AnswerB

Correct pre-built container for scikit-learn.

Why this answer

Vertex AI provides pre-built containers for scikit-learn with the prefix 'us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0.23-0'. The others are for different frameworks.

780
MCQhard

A team notices that a Vertex AI Pipeline step re-executes every time the pipeline runs, even though its inputs and code have not changed. They want to enable caching for this component to avoid redundant computation. However, caching is currently disabled globally. Which configuration change will enable caching for that specific component?

A.Use the 'enable_caching' parameter when creating the pipeline job via the SDK.
B.Add 'caching=True' to the @dsl.component decorator for that component.
C.Add 'caching=True' to the @dsl.pipeline decorator.
D.Set the environment variable 'CACHE_ENABLED=True' on the Vertex AI Pipeline job.
AnswerB

Setting caching=True in the component decorator enables caching for that component.

Why this answer

Caching can be enabled per component by setting the 'caching' parameter to True in the component decorator or by setting the environment variable. Setting it in the pipeline decorator affects all components. In KFP SDK v2, you can use @dsl.component(caching=True) or set the task-level caching.

The correct answer is to set caching=True in the component definition.

781
MCQeasy

An organization wants to deploy a pre-trained BERT model for sentiment analysis on Vertex AI. They want to fine-tune it on their domain-specific data. Which feature in Vertex AI allows them to find and fine-tune a suitable foundation model with minimal effort?

A.Vertex AI Model Garden
B.Vertex AI AutoML
C.Vertex AI Custom Training
D.Vertex AI JumpStart
AnswerD

JumpStart offers pre-built models and fine-tuning options with minimal code.

Why this answer

Vertex AI JumpStart provides one-click deployment and fine-tuning of foundation models, including BERT and other NLP models. Model Garden is for model discovery, but fine-tuning is typically done via JumpStart. AutoML is for training custom models, not fine-tuning existing ones.

Custom training requires more manual effort.

782
Matchingmedium

Match each ML acronym to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Area Under the ROC Curve

Mean Squared Error

Tensor Processing Unit

Support Vector Machine

Principal Component Analysis

Why these pairings

The correct matches are: ML - 'A subset of AI that enables systems to learn from data'; AI - 'The simulation of human intelligence by machines'; NLP - 'A field of AI that focuses on interaction between computers and human language'; CNN - 'A type of neural network designed for processing grid-like data, especially images'. Common confusions include swapping definitions of ML and AI, or confusing NLP with CNN.

783
Multi-Selectmedium

A machine learning team uses Vertex AI Pipelines for model training. They want to implement a conditional step that runs additional evaluation if the model accuracy exceeds 0.9, otherwise it runs a data augmentation component. Which two Kubeflow Pipelines SDK v2 constructs can they use to achieve this? (Choose two.)

Select 2 answers
A.dsl.ParallelFor
B.dsl.Else
C.dsl.ExitHandler
D.dsl.Collected
E.dsl.If
AnswersB, E

dsl.Else defines the branch executed when the condition is false.

Why this answer

In Kubeflow Pipelines SDK v2, `dsl.If` and `dsl.Else` are the constructs used to create conditional execution branches within a pipeline. `dsl.If` evaluates a condition (e.g., model accuracy > 0.9) and runs the enclosed steps only if true; `dsl.Else` defines the alternative branch that runs when the condition is false. This directly implements the required logic for running additional evaluation on high accuracy or data augmentation otherwise.

Exam trap

The trap here is that candidates may confuse `dsl.ParallelFor` or `dsl.ExitHandler` with conditional constructs, but `dsl.If` and `dsl.Else` are the only SDK v2 constructs specifically designed for branching based on runtime conditions.

784
MCQmedium

You are running a Vertex AI custom training job with pre-built TensorFlow container. You want to use TPU v3 pods for faster training. Which configuration is required?

A.Specify machine type as tpu-v3-8 in the worker pool spec and use tf.distribute.TPUStrategy
B.Set the worker pool machine type to n1-standard-8 and add --accelerator type=tpu
C.Use a custom container with TPU libraries and set --tpu-topology=v3-8
D.Enable TPU by setting the environment variable TPU_NAME
AnswerA

TPU machine type must be specified, and code must use TPUStrategy.

Why this answer

TPU training requires specifying the TPU type in the worker pool spec and using tf.distribute.TPUStrategy in code. The pre-built container supports TPUs. Runtime version must include TPU support.

785
MCQeasy

A global retail company uses Vertex AI Recommendations to provide product recommendations on their website. They have a large catalog and millions of users. The initial deployment works well for active users, but they notice that new users (with no purchase history) receive generic recommendations that are not personalized. The company wants to improve the cold-start experience. They have user demographic data (age, location) available at sign-up. Current recommendation model is a collaborative filtering model using the built-in Vertex AI Recommendations. What should the company do to improve personalization for new users?

A.Collect more historical interaction data before showing recommendations
B.Disable recommendations for new users until they have at least 10 interactions
C.Increase the user exploration parameter in the Vertex AI Recommendations configuration
D.Build a custom two-tower recommendation model using Vertex AI Training
AnswerC

Exploration helps serve diverse items to new users to learn preferences.

Why this answer

Option C is correct because increasing the user exploration parameter in Vertex AI Recommendations instructs the model to allocate a higher percentage of recommendations to items with less historical data, effectively enabling personalized suggestions for cold-start users based on available demographic signals. This parameter directly controls the balance between exploiting known user-item interactions and exploring new or less-seen items, which is the standard mechanism within Vertex AI's built-in collaborative filtering to address the cold-start problem without requiring a custom model.

Exam trap

Google Cloud often tests the misconception that cold-start problems always require custom models or additional data collection, when in fact built-in platform features like exploration parameters are designed specifically to handle this scenario without custom development.

How to eliminate wrong answers

Option A is wrong because collecting more historical interaction data before showing recommendations does not solve the immediate cold-start problem for new users; it merely delays personalization and contradicts the goal of improving the experience from sign-up. Option B is wrong because disabling recommendations for new users until they have at least 10 interactions is a poor user experience and ignores the fact that Vertex AI Recommendations can leverage user demographic data (age, location) to provide personalized suggestions even without purchase history. Option D is wrong because building a custom two-tower recommendation model using Vertex AI Training is unnecessary and over-engineered; Vertex AI's built-in service already supports exploration parameters and can utilize demographic features for cold-start personalization without requiring custom model development.

786
MCQhard

You are deploying a PyTorch model on Vertex AI and want to use NVIDIA Triton Inference Server for optimal performance. You have built a custom container with Triton. Which serving configuration should you use?

A.Deploy the model on GKE with Triton and expose via Istio.
B.Use the prebuilt Vertex AI PyTorch prediction container and set environment variables to enable Triton.
C.Use Vertex AI Model Optimization to automatically convert the model to TensorRT and deploy with built-in server.
D.Upload your Triton container to Container Registry and specify it as the prediction container in Vertex AI Model.
AnswerD

Vertex AI allows custom containers for prediction; Triton can be included in the container.

Why this answer

Vertex AI supports custom containers for prediction. You can bring your own container with Triton installed, and Vertex AI will deploy it to the endpoint.

787
Multi-Selecthard

A team is operationalizing a machine learning pipeline using Vertex AI. They want to automatically track experiment runs, log model parameters and metrics, and store model artifacts for reproducibility. They also need to capture lineage between pipeline components (e.g., which dataset and hyperparameter tuning job produced a model). Which TWO services should they use together to achieve this? (Choose two.)

Select 2 answers
A.Vertex AI Model Registry
B.Vertex AI Feature Store
C.Vertex AI Metadata
D.Vertex AI Experiments
E.Vertex AI Workbench
AnswersC, D

Captures lineage between pipeline components, artifacts, and executions.

Why this answer

Vertex AI Experiments provides experiment tracking (parameters, metrics, artifacts). Vertex AI Metadata captures lineage between pipeline components, artifacts, and executions. Together, they enable full reproducibility and traceability.

788
Matchingmedium

Match each ML model interpretability method to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Game-theoretic approach to explain feature contributions

Local surrogate model to explain individual predictions

Ranking features by their impact on model output

Shows marginal effect of a feature on predictions

Measures decrease in performance when feature is shuffled

Why these pairings

Interpretability methods serve different purposes: LIME and SHAP are local, while PDP and permutation importance are global. Common confusions include swapping local vs global or misattributing LIME/SHAP to specific model types.

789
MCQmedium

Refer to the exhibit. A user receives the error shown when trying to upload a model to Vertex AI. What is the most likely cause?

A.The container image 'gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-12:latest' is not accessible.
B.The user does not have the 'roles/aiplatform.admin' or the 'aiplatform.models.upload' permission on the project.
C.The user specified an incorrect region (us-central1) that does not support Vertex AI.
D.The Cloud Storage bucket 'gs://my-model-artifacts/fraud-detection/v2/' does not exist.
AnswerB

Permission denied errors typically indicate missing IAM roles.

Why this answer

The error message indicates a permission issue during model upload. The user lacks the 'aiplatform.models.upload' permission or the broader 'roles/aiplatform.admin' role on the project. Vertex AI requires these IAM permissions to authorize the upload action, regardless of other resource accessibility.

Exam trap

Google Cloud often tests the distinction between permission errors and resource availability errors, trapping candidates who assume the error is due to a missing bucket or container image rather than IAM misconfiguration.

How to eliminate wrong answers

Option A is wrong because if the container image were inaccessible, the error would typically occur during deployment or prediction, not during the upload step, and the error message would reference image pull failures (e.g., 'ImagePullBackOff'). Option C is wrong because us-central1 is a fully supported region for Vertex AI; the error does not mention region unavailability. Option D is wrong because if the Cloud Storage bucket did not exist, the error would be a 404 or 'bucket not found' message, not a permission-denied error.

790
MCQmedium

An ML engineer has a model trained in Vertex AI and wants to deploy it to an endpoint with autoscaling and traffic splitting for canary testing. They have the model artifact stored in Vertex AI Model Registry with alias 'champion'. What is the correct sequence of steps?

A.Upload model to registry, create endpoint, then deploy model to endpoint with traffic split.
B.Create endpoint, upload model to registry, then deploy model to endpoint with traffic split.
C.Create endpoint, deploy model directly from Cloud Storage, then add traffic split.
D.Upload model to registry, then create endpoint and deploy in one command using gcloud ai endpoints deploy-model.
AnswerA

Correct order: model first, then endpoint, then deploy.

Why this answer

The standard sequence: upload model to registry, create endpoint, deploy model to endpoint with traffic split.

791
MCQhard

Refer to the exhibit. A user attempts to upload a model to Vertex AI Model Registry using the gcloud CLI. The command fails with the error shown. What is the most likely cause?

A.The region us-central1 does not support Vertex AI
B.The --container-command flag is misspelled
C.The --artifact-uri points to a directory instead of a model file
D.The --container-ports flag expects a comma-separated list
AnswerC

Error indicates URI must point to a single file.

Why this answer

The error indicates that the `--artifact-uri` flag points to a directory (e.g., `gs://bucket/model/`) rather than a specific model file (e.g., `gs://bucket/model/saved_model.pb`). Vertex AI Model Registry requires a direct path to the model artifact file, not a container directory, because the service needs to locate and register the exact model binary for deployment.

Exam trap

Google Cloud often tests the distinction between a directory path and a file path in cloud CLI commands, exploiting the common mistake of assuming a folder URI is acceptable when the service expects a specific artifact file.

How to eliminate wrong answers

Option A is wrong because us-central1 is a fully supported region for Vertex AI, including Model Registry, and the error message does not indicate a regional restriction. Option B is wrong because the `--container-command` flag is correctly spelled in the command; the error is unrelated to flag spelling. Option D is wrong because the `--container-ports` flag does accept a comma-separated list, but the error message points to the `--artifact-uri` value, not to the ports flag.

792
MCQmedium

A team wants to perform hyperparameter tuning on a Vertex AI custom training job with 100 trials. They require an algorithm that efficiently explores the search space by learning from previous trials. Which algorithm should they select in the study configuration?

A.RANDOM_SEARCH
B.HYPERBAND
C.ALGORITHM_UNSPECIFIED (defaults to Bayesian optimization)
D.GRID_SEARCH
AnswerC

Bayesian optimization is the default and most efficient algorithm for hyperparameter tuning.

Why this answer

Vertex AI Vizier provides Bayesian optimization as a default algorithm that builds a probabilistic model of the objective function and selects hyperparameters based on expected improvement. This is more efficient than grid or random search for most scenarios. The ALGORITHM_UNSPECIFIED defaults to Bayesian optimization.

793
MCQmedium

A data scientist wants to use AutoML to classify images of retail products into categories. There are 50 categories and the dataset has 100,000 labelled images. Which Vertex AI AutoML service is most appropriate?

A.AutoML Tables
B.AutoML Video
C.AutoML Vision
D.AutoML NLP
AnswerC

Why this answer

AutoML Vision is designed for image classification, object detection, and segmentation. AutoML Tables is for tabular data, AutoML NLP for text, and AutoML Video for video.

794
MCQmedium

A team is building ML pipelines with Vertex AI. They want to reuse standard pipeline components across teams and enforce governance. What approach should they take?

A.Use Vertex AI Pipelines with pre-built and custom components organized in a component registry.
B.Store pipeline definitions in a shared Cloud Storage bucket and copy them manually.
C.Use Cloud Composer to orchestrate ad-hoc scripts.
D.Have each team build their own pipelines independently.
AnswerA

Vertex AI component registry enables sharing and governance of reusable components.

Why this answer

Standardized pipeline templates (components) in Vertex AI Pipelines allow reuse and governance across teams.

795
MCQhard

A team deploys a real-time model using a custom container on Vertex AI Prediction. The container is large (5 GB) and cold starts are causing latency spikes. The endpoint is configured with `min_replica_count=0` to reduce cost. The team wants to keep the cost low while reducing cold starts. What is the best approach?

A.Set `min_replica_count=1` to keep at least one replica always warm.
B.Use a prebuilt container for the model framework to reduce image size.
C.Enable container memory optimization to reduce startup time.
D.Provision a Persistent Disk (SSD) for the container image to speed up download.
AnswerA

A single warm replica handles traffic immediately while autoscaling adds more.

Why this answer

Option B is correct because configuring a minimum number of always-on replicas (e.g., 1) eliminates cold starts for most traffic. Option A is wrong because it may not help if container is large. Option C is wrong because prebuilding images doesn't reduce cold start startup overhead.

Option D is wrong because SSD can help but not eliminate cold start latency.

796
MCQhard

A company uses Cloud Composer to orchestrate their ML workflows. They have an Airflow DAG that runs a Vertex AI pipeline, then a BigQuery query, then a Dataflow job. The DAG is failing because the Vertex AI pipeline takes longer than the Airflow task timeout. What is the best way to handle this?

A.Increase the Airflow task timeout to account for the maximum expected pipeline duration.
B.Run the pipeline synchronously by using a Kubernetes Pod operator.
C.Split the DAG into two DAGs: one for the pipeline, one for the rest.
D.Use the Airflow Vertex AI pipeline operator with wait_for_completion=True.
AnswerD

The operator can poll the pipeline status, freeing the worker while waiting.

Why this answer

Vertex AI pipelines are asynchronous; the Airflow operator should wait for completion using a sensor or by setting the wait_for_completion parameter. Increasing the task timeout is a workaround but not best practice. Using a separate DAG for the pipeline defeats the purpose of orchestration.

797
MCQeasy

An ML engineer needs to trigger a Vertex AI Pipeline on a recurring schedule, every 24 hours, to retrain a model with the latest data. Which approach should they use to set up this schedule?

A.Use Cloud Tasks to queue pipeline runs daily.
B.Create a Cloud Scheduler job that calls the Vertex AI API to create a pipeline job.
C.Set a cron expression in the pipeline definition file using the 'schedule' parameter.
D.Use the Vertex AI Pipelines UI to set a schedule directly on the pipeline.
AnswerB

Cloud Scheduler can invoke Vertex AI API via HTTP or Pub/Sub to trigger a pipeline on a schedule.

Why this answer

Cloud Scheduler is the native Google Cloud service for cron-based job scheduling. By configuring a Cloud Scheduler job to call the Vertex AI API (e.g., via a HTTP POST to the projects.locations.pipelineJobs.create endpoint), the engineer can trigger a pipeline run every 24 hours. This approach is reliable, supports authentication via OAuth, and integrates directly with Vertex AI Pipelines without requiring additional orchestration code.

Exam trap

Cisco often tests the misconception that Vertex AI Pipelines has a built-in scheduling feature (like a cron parameter in the pipeline definition or a UI schedule button), when in fact scheduling must be implemented using Cloud Scheduler or similar external services.

How to eliminate wrong answers

Option A is wrong because Cloud Tasks is a distributed task queue designed for asynchronous message delivery and retries, not for recurring cron-based scheduling; it would require an additional scheduler to enqueue tasks daily. Option C is wrong because Vertex AI Pipeline definitions do not support a 'schedule' parameter; scheduling is handled externally, not within the pipeline YAML or JSON definition. Option D is wrong because the Vertex AI Pipelines UI does not provide a built-in recurring schedule feature; schedules must be created using Cloud Scheduler or other external tools.

798
MCQhard

A company trains a model using features from Vertex AI Feature Store. They notice training-serving skew because the feature values used at training time differ from those served online. How should they address this?

A.Use the same online store for both training and serving
B.Disable caching in the online store
C.Enable feature monitoring to detect drift
D.Use point-in-time correct retrieval from the offline store for training data
AnswerD

This ensures training uses the exact feature values as of the label time, eliminating skew.

Why this answer

Point-in-time correct retrieval ensures that the feature values used in training correspond to the exact same point in time as the label, preventing data leakage and skew.

799
Multi-Selecteasy

A data analyst wants to use low-code ML to analyze text data. Which TWO Google Cloud services are appropriate?

Select 2 answers
A.Vertex AI Workbench
B.Document AI
C.Cloud Natural Language API
D.AutoML Natural Language
E.BigQuery ML for sentiment
AnswersC, D

Correct: Pre-trained sentiment and entity analysis via API.

Why this answer

Cloud Natural Language API is a low-code ML service that provides pre-trained models for analyzing text, including sentiment analysis, entity recognition, and syntax analysis, without requiring custom model training. It is appropriate for a data analyst who wants to quickly extract insights from text data using simple API calls.

Exam trap

The trap here is that candidates may confuse BigQuery ML's sentiment analysis feature (which is SQL-based and not a dedicated low-code service) with a standalone low-code ML service, or mistakenly think Vertex AI Workbench is low-code when it actually requires coding in Python or other languages.

800
Multi-Selectmedium

A company is implementing MLOps with Vertex AI. They need to ensure that only approved models can be deployed to production. Which TWO practices should they adopt?

Select 2 answers
A.Use a single endpoint for all models without versioning
B.Allow any user with Vertex AI User role to deploy models
C.Use Vertex AI Model Registry aliases (e.g., 'champion') to mark production-ready models
D.Store all models in Cloud Storage without registry
E.Enable Vertex AI Continuous Evaluation with manual approval gate
AnswersC, E

Aliases provide a clear designation of production models.

Why this answer

Using aliases to designate champion and enabling manual approval in Continuous Evaluation ensures governance.

801
MCQeasy

A machine learning engineer wants to manage multiple model versions and facilitate collaboration across teams. The goal is to track model lineage, versioning, and approvals. Which Vertex AI service should they use?

A.Vertex AI Model Registry
B.Vertex AI ML Metadata
C.Vertex AI Feature Store
D.Vertex AI Vizier
AnswerA

Model Registry is designed for model versioning, lifecycle management, and collaboration.

Why this answer

Option C is correct because Model Registry provides versioning, approval tracking, and integration with Vertex AI Pipelines. Option A is wrong because Feature Store stores features, not models. Option B is wrong because ML Metadata is lower-level and less user-friendly.

Option D is wrong because Vizier is for hyperparameter tuning.

802
MCQmedium

A team is implementing CI/CD for ML using Cloud Build. They want to trigger a training pipeline in Vertex AI whenever a new model code is pushed to the main branch of the repository. Which Cloud Build configuration should they use to achieve this?

A.Set up a Cloud Build trigger that runs on push to any branch, and in the build step, use gcloud to submit a Vertex AI Pipeline job.
B.Use a Cloud Scheduler job to periodically check for new commits on main and trigger Cloud Build.
C.Use Cloud Functions to watch the repository and call Cloud Build on push to main.
D.Set up a Cloud Build trigger that runs on push to main branch, and in the build step, use gcloud to submit a Vertex AI Pipeline job.
AnswerD

This correctly limits the trigger to the main branch and uses gcloud to launch the pipeline.

Why this answer

Option D is correct because Cloud Build triggers can be configured to fire specifically on pushes to the main branch. The build step then uses the gcloud command to submit a Vertex AI Pipeline job, which directly integrates the CI/CD pipeline with Vertex AI's orchestration. This approach is event-driven, immediate, and requires no additional services or polling.

Exam trap

Cisco often tests the candidate's understanding that Cloud Build triggers can be scoped to specific branches and that using gcloud directly in a build step is the simplest and most efficient way to invoke Vertex AI Pipelines, rather than introducing unnecessary intermediate services like Cloud Functions or Scheduler.

How to eliminate wrong answers

Option A is wrong because triggering on push to any branch would cause the pipeline to run on feature branches, pull requests, and other non-main branches, leading to unnecessary executions and potential conflicts. Option B is wrong because Cloud Scheduler polling is inefficient, introduces latency, and is not the intended event-driven mechanism; Cloud Build triggers are designed to react to repository events directly. Option C is wrong because using Cloud Functions as an intermediary adds unnecessary complexity and cost; Cloud Build natively supports repository event triggers without requiring a separate compute service.

803
MCQmedium

You run the above command to deploy a new model version to an existing endpoint. After deployment, you observe that the endpoint's previous model version is still receiving 100% of traffic. What is the most likely reason for this?

A.The new model is still in the 'creating' state and hasn't been activated.
B.The model ID provided does not exist in the endpoint.
C.The --traffic-split flag is specified incorrectly; it should use model IDs, not '0-100'.
D.The min-replica-count is too high, preventing traffic splitting.
AnswerC

Correct syntax requires model IDs with percentages.

Why this answer

The traffic-split flag syntax is incorrect. The correct syntax for Vertex AI is --traffic-split=<model-id>=<percentage> for each model. Without correct model IDs, the flag is ignored, and no traffic split is applied, so the existing version continues to receive all traffic.

804
MCQmedium

A developer sees this error when calling the endpoint. What is the most likely cause?

A.The model is still in training
B.The model is deployed but not yet serving
C.The endpoint has no deployed model
D.The request payload size exceeds limit
AnswerB

Correct: Model deployment is still initializing.

Why this answer

The error 'model is not serving' occurs when the endpoint exists and a model is deployed, but the deployment is not yet in the 'serving' state (e.g., still loading, scaling, or warming up). In SageMaker, the endpoint must transition through 'Creating' and 'InService' before it can serve inference requests. Option B correctly identifies that the model is deployed but not yet ready to handle traffic.

Exam trap

Google Cloud often tests the distinction between 'no model deployed' and 'model deployed but not serving', where candidates confuse a deployment that exists but is not yet ready with a missing deployment.

How to eliminate wrong answers

Option A is wrong because if the model were still in training, the endpoint would not exist or would return a 'ModelNotFound' error, not a 'not serving' error. Option C is wrong because if the endpoint had no deployed model, the error would be 'NoSuchModel' or 'EndpointNotFound', not a serving state error. Option D is wrong because payload size limits (typically 5 MB for SageMaker real-time endpoints) cause a '413 Request Entity Too Large' or 'PayloadTooLarge' error, not a 'not serving' error.

805
Multi-Selectmedium

You are building a CI/CD pipeline for an ML model using Cloud Build. When code is pushed to the main branch, you want to automatically build a training image, run a Vertex AI pipeline, and if the model evaluation passes, deploy it to a staging endpoint. Which two components are essential for this CI/CD pipeline?

Select 2 answers
A.Cloud Scheduler to trigger the pipeline on a schedule.
B.Cloud Functions to deploy the model.
C.Vertex AI Pipelines to orchestrate training and evaluation.
D.Cloud Build trigger configured to respond to push events to the main branch.
E.Vertex AI Continuous Training service.
AnswersC, D

Pipelines are used to run the ML workflow.

Why this answer

Option C is correct because Vertex AI Pipelines is the service that orchestrates the entire ML workflow, including training, evaluation, and conditional deployment logic. In a CI/CD pipeline, you need a managed orchestrator to chain steps like model training, evaluation, and deployment, and Vertex AI Pipelines provides that with its Kubeflow Pipelines-based DAG execution.

Exam trap

Cisco often tests the distinction between event-driven triggers (Cloud Build trigger on push) and schedule-based triggers (Cloud Scheduler), so candidates mistakenly pick Cloud Scheduler when the requirement is for a code-push event.

806
MCQeasy

You deployed a model to a Vertex AI endpoint with minReplicas=0 and maxReplicas=5. After sending prediction requests, you notice the endpoint takes about 30 seconds to respond initially, but subsequent requests are fast. What is the most likely cause?

A.The model is too large for the machine type.
B.Cold start occurs because the endpoint scaled down to zero.
C.The VPC Service Controls are blocking the initial request.
D.The endpoint's autoscaling is misconfigured.
AnswerB

Correct. With minReplicas=0, the endpoint scales down to zero, leading to cold start latency.

Why this answer

Option B is correct because Vertex AI endpoints with minReplicas=0 scale down to zero when idle. The first request after a period of inactivity triggers a cold start, where the endpoint must provision a new VM instance and load the model, causing a ~30-second delay. Subsequent requests are fast because the instance remains warm and handles them without provisioning overhead.

Exam trap

Cisco often tests the distinction between cold start latency and persistent performance issues, so candidates may mistakenly attribute the initial delay to model size or network misconfiguration instead of recognizing the intentional scaling-to-zero behavior.

How to eliminate wrong answers

Option A is wrong because a model too large for the machine type would cause persistent latency or errors on every request, not just the first one after idle time. Option C is wrong because VPC Service Controls enforce network boundaries and would block all requests consistently, not just the initial one with a 30-second delay. Option D is wrong because the autoscaling configuration (minReplicas=0, maxReplicas=5) is correct for scaling to zero; the observed behavior is the expected cold start, not a misconfiguration.

807
Multi-Selecteasy

An ML engineer is monitoring a Vertex AI Endpoint and notices a spike in 5xx error rates. Which TWO metrics should they examine to diagnose the issue? (Choose 2)

Select 2 answers
A.Feature drift alert count
B.GPU utilization on the endpoint
C.CPU utilization on the endpoint
D.Vertex AI Model Monitoring skew score
E.Number of predictions per minute
AnswersB, C

GPU exhaustion can lead to prediction failures.

Why this answer

CPU/GPU utilization can indicate resource exhaustion causing errors. Prediction job failures metric directly shows failed predictions.

808
MCQhard

A company deploys a training pipeline on Vertex AI using custom containers. The pipeline includes a hyperparameter tuning job that uses Bayesian optimization. After several runs, they observe that the tuning job is not converging and the search space is large. They want to reduce the number of trials while still finding good hyperparameters. Which strategy should they use?

A.Increase the number of parallel trials to explore more points simultaneously.
B.Use Grid search instead of Bayesian optimization to systematically cover the search space.
C.Implement early stopping by using the 'early_stopping' flag in the hyperparameter tuning job.
D.Reduce the search space by applying feature selection and using prior knowledge.
AnswerD

A smaller search space requires fewer trials to find good hyperparameters.

Why this answer

Option D is correct because reducing the search space using prior knowledge directly decreases the number of trials needed. Option A is wrong because increasing parallel trials does not reduce the total number of trials. Option B is wrong because grid search generally requires more trials than Bayesian optimization.

Option C is wrong because early stopping reduces time per trial but does not reduce the number of trials.

809
Multi-Selecthard

Which THREE of the following are recommended practices for model governance and lineage in Vertex AI?

Select 3 answers
A.Enable Vertex AI ML Metadata to track artifacts, executions, and contexts.
B.Use Vertex AI Experiments to log parameters and metrics.
C.Store model artifacts in Cloud Storage with metadata in a database.
D.Manually record model lineage in a spreadsheet.
E.Use Vertex AI Model Registry to manage model versions and stages.
AnswersA, B, E

ML Metadata provides automated lineage tracking.

Why this answer

Vertex AI ML Metadata is a fully managed service that automatically tracks artifacts, executions, and contexts across the ML workflow. By enabling it, you create a lineage graph that records every step from data preparation to model deployment, which is essential for auditability and reproducibility. This is a core recommended practice for model governance because it provides an immutable, queryable history of all model-related activities.

Exam trap

Google Cloud often tests the distinction between using native Vertex AI services (like ML Metadata, Experiments, and Model Registry) versus ad-hoc or manual methods (like spreadsheets or custom databases) that lack automated governance and audit trails.

810
MCQmedium

An ML engineer is building a pipeline on Vertex AI Pipelines and wants to pass a dataset artifact from one component to another without incurring additional cost for intermediate storage. How should they define the input and output types?

A.Store the data in BigQuery and pass the table reference.
B.Define output as a string and pass the GCS path manually.
C.Use the Dataset artifact type available in the KFP SDK for inputs and outputs.
D.Use in-memory Python objects as function return values.
AnswerC

Artifact types like Dataset are designed for passing data via GCS URIs.

Why this answer

Option C is correct because using the KFP SDK's Dataset artifact type allows Vertex AI Pipelines to manage the data as a lineage-tracked artifact, automatically handling the underlying GCS storage reference without incurring additional intermediate storage costs. The artifact type enables the pipeline to pass the metadata (URI, type, etc.) between components efficiently, leveraging the native artifact management of the KFP SDK.

Exam trap

The trap here is that candidates often confuse 'avoiding additional cost' with 'avoiding any storage at all,' leading them to choose in-memory passing (Option D) or manual string paths (Option B), not realizing that the KFP artifact system uses the same underlying GCS storage that is already part of the pipeline's infrastructure, thus incurring no extra cost.

How to eliminate wrong answers

Option A is wrong because storing data in BigQuery and passing a table reference incurs BigQuery storage and query costs, and it introduces an external dependency that is not necessary for intermediate data passing within a pipeline. Option B is wrong because passing a GCS path as a string manually bypasses the artifact tracking and lineage capabilities of KFP, leading to potential issues with reproducibility and cost management, though it does not directly incur additional storage cost. Option D is wrong because in-memory Python objects cannot be passed between pipeline components that run in separate containers or environments; they would require serialization and storage, defeating the purpose of avoiding intermediate storage costs.

811
Multi-Selecteasy

Which TWO options are best practices for building ML pipelines on Vertex AI?

Select 2 answers
A.Use Cloud Functions to execute individual pipeline steps
B.Hardcode pipeline parameters in the component definitions
C.Use custom container components to encapsulate reusable logic
D.Always use the same compute environment for training and serving to ensure consistency
E.Leverage Vertex ML Metadata to track artifact lineage
AnswersC, E

Reusable components allow sharing across pipelines and reduce duplication.

Why this answer

Option C is correct because custom container components allow you to encapsulate reusable logic with specific dependencies, libraries, and environments, enabling consistent execution across pipeline steps. This is a best practice for building modular, maintainable ML pipelines on Vertex AI, as it decouples step logic from the pipeline orchestration and supports versioning and testing.

Exam trap

Google Cloud often tests the misconception that serverless functions like Cloud Functions are suitable for ML pipeline steps, but the trap is that ML steps require persistent state, longer timeouts, and specialized hardware, which Cloud Functions cannot provide.

812
MCQeasy

A startup wants to deploy a small machine learning model for real-time predictions but has a very limited budget. Traffic is minimal and predictable. They want to avoid paying for idle resources. Which serving option is most cost-effective?

A.Deploy the model on a single Compute Engine VM with a GPU.
B.Use Vertex AI Batch Prediction for each prediction request.
C.Deploy the model as a Cloud Run service using a custom container.
D.Deploy the model to Vertex AI Endpoint with min_replica_count=0.
AnswerC

Cloud Run scales to zero and charges only when serving requests.

Why this answer

Option B is correct because Cloud Run with a custom container can scale to zero when idle, incurring no cost when not in use. Option A is wrong because Vertex AI Endpoint requires at least one replica (min_replica_count >= 1). Option C is wrong because batch prediction is not real-time.

Option D is wrong because deploying on a Compute Engine VM requires 24/7 cost even when idle.

813
MCQhard

A model serving team notices that during a flash sale, a real-time recommendation model experiences sudden spikes in traffic, causing some requests to time out. The endpoint is configured with `min_replica_count=3`, `max_replica_count=10`, and autoscaling metric set to `target_utilization=0.6` on CPU. Despite this, autoscaling is too slow. What change will most improve the autoscaling responsiveness?

A.Add a custom metric based on GPU utilization, assuming the model uses GPU.
B.Increase the target CPU utilization to 0.8 to reduce the number of replicas and save cost.
C.Reduce `min_replica_count` to 1 to allow more aggressive scaling.
D.Change the autoscaling metric to 'average request count per replica' with an appropriate target.
AnswerD

Request count directly reflects load and scales more quickly than CPU.

Why this answer

Option A is correct because using request count per replica (transactions per second) as a direct measure of load triggers autoscaling faster. Option B is wrong because increasing target utilization makes it slower. Option C is wrong because GPU metrics are only relevant for GPU models.

Option D is wrong because reducing min replicas may cause underprovisioning.

814
MCQeasy

A data scientist needs to train a time-series forecasting model on historical sales data stored in BigQuery to predict future demand. The data has strong seasonal patterns. Which BigQuery ML model type should they use?

A.MATRIX_FACTORIZATION
B.BOOSTED_TREE_REGRESSOR
C.ARIMA_PLUS
D.K_MEANS
AnswerC

ARIMA_PLUS is the correct model for time-series forecasting in BigQuery ML.

Why this answer

ARIMA_PLUS is the correct choice because it is specifically designed for time-series forecasting in BigQuery ML, handling seasonal patterns, trend decomposition, and automatic hyperparameter tuning. It models autoregressive (AR) and moving average (MA) components with seasonal differencing, making it ideal for historical sales data with strong seasonal cycles.

Exam trap

Cisco often tests the misconception that any regression model (like BOOSTED_TREE_REGRESSOR) can be naively applied to time-series data, ignoring the need for specialized models that handle temporal dependencies and seasonality natively.

How to eliminate wrong answers

Option A is wrong because MATRIX_FACTORIZATION is used for recommendation systems (e.g., collaborative filtering) and cannot model temporal dependencies or seasonality in time-series data. Option B is wrong because BOOSTED_TREE_REGRESSOR is a tree-based ensemble method for regression tasks but does not inherently capture time-series structures like seasonality, trend, or autocorrelation without extensive feature engineering. Option D is wrong because K_MEANS is an unsupervised clustering algorithm that groups data points by similarity and has no mechanism for forecasting future values or modeling sequential patterns.

815
MCQmedium

You are using Vertex AI Vector Search to find nearest neighbors for a recommendation system. Your index is built on 10M embeddings and you need low-latency queries. You want to ensure that adding new embeddings does not require a full index rebuild. Which index type should you use?

A.Brute-force index (exact neighbor search)
B.ANN index with the 'streaming' update mode
C.ANN index with the 'batch' update mode
D.Tree-AH index
AnswerB

Streaming mode allows incremental updates without full index rebuild.

Why this answer

The ANN index with 'streaming' update mode is correct because it supports real-time insertion of new embeddings without requiring a full index rebuild, which is essential for low-latency recommendation systems. This mode uses a separate unsearched buffer for new vectors and periodically merges them into the main index, enabling continuous updates while maintaining query performance.

Exam trap

Cisco often tests the misconception that all ANN indexes support incremental updates, but only the 'streaming' mode in Vertex AI Vector Search avoids full rebuilds, while 'batch' mode and Tree-AH require periodic full reindexing.

How to eliminate wrong answers

Option A is wrong because a brute-force index computes exact distances against all 10M embeddings, which is computationally prohibitive for low-latency queries and does not support efficient incremental updates. Option C is wrong because the 'batch' update mode requires rebuilding the entire index from scratch when new embeddings are added, violating the requirement to avoid full rebuilds. Option D is wrong because Tree-AH (Asymmetric Hashing) is a specific ANN algorithm that typically requires batch rebuilding and does not natively support streaming updates like the 'streaming' mode in Vertex AI Vector Search.

816
Multi-Selecthard

You are fine-tuning a large language model using Vertex AI Training with spot VMs to reduce cost. Your training job keeps getting preempted, causing delays. Which THREE strategies can help mitigate the impact of preemption?

Select 3 answers
A.Enable checkpointing and restore from the latest checkpoint after preemption
B.Increase the number of worker replicas to train faster
C.Reduce batch size to lower memory usage
D.Switch to dedicated (non-preemptible) VMs
E.Use a smaller, more efficient model architecture
AnswersA, B, E

Allows resuming from last saved state.

Why this answer

Checkpointing saves progress; using a smaller model reduces training time; increasing number of workers allows faster training and recovery. Using dedicated VMs increases cost. Reducing batch size doesn't help with preemption frequency.

817
MCQmedium

An ML engineer has set up Vertex AI Model Monitoring on an endpoint with a sampling rate of 0.1 (10%). They notice that the monitoring job runs hourly but the reported drift metrics seem inconsistent. What is the most likely cause?

A.The sampling rate is too low, leading to insufficient data for reliable drift statistics.
B.Prediction drift monitoring is not enabled; only feature drift is configured.
C.The drift detection algorithm is not suited for this model; try changing from JS divergence to L-infinity distance.
D.The monitoring frequency is too low; it should be set to every 5 minutes.
AnswerA

10% sampling may result in small sample sizes, causing high variance in drift estimates.

Why this answer

A low sampling rate means only a small fraction of requests are logged for monitoring, which can lead to statistical noise and inconsistent drift metrics. Increasing the sampling rate would improve accuracy.

818
Multi-Selecthard

Which TWO are best practices for implementing a low-code ML solution using Vertex AI AutoML? (Choose 2)

Select 2 answers
A.Use the AutoML recommended data split (train/validation/test) to avoid overfitting.
B.Impute missing values manually before uploading the dataset.
C.Normalize numerical features to zero mean and unit variance.
D.Enable automatic feature engineering by leaving feature columns as raw data.
E.Export the data and train a custom model with a different architecture.
AnswersA, D

Why A is correct: AutoML optimizes split for best performance.

Why this answer

Option A is correct because AutoML's recommended data split (train/validation/test) is designed to prevent overfitting by ensuring the model is evaluated on unseen data. AutoML automatically handles the split ratio (e.g., 80/10/10) and stratification, which is a best practice for low-code ML solutions where manual split logic is error-prone.

Exam trap

Google Cloud often tests the misconception that manual preprocessing (like imputation or normalization) is required for AutoML, when in fact AutoML is designed to handle these steps automatically, and manual intervention can degrade performance or cause errors.

819
Multi-Selecthard

An e-commerce company uses a recommendation model that suggests products based on user browsing history. The model was trained on data from the past year and has high accuracy on the test set. However, after deployment, the click-through rate (CTR) on recommendations is much lower than expected. Which three steps should the data scientist take to diagnose and improve the model? (Choose THREE)

Select 3 answers
A.Run offline evaluation on a holdout dataset to confirm accuracy
B.Set up an A/B experiment comparing the model's recommendations against a baseline
C.Retrain the model on the most recent three months of data to capture recent trends
D.Check the distribution of predictions versus the training set to detect drift
E.Increase the training dataset size by including data from two years ago
AnswersB, C, D

A/B testing validates the model's real-world performance and identifies issues.

Why this answer

Option B is correct because an A/B experiment directly measures the model's real-world impact by comparing its CTR against a baseline (e.g., random or popularity-based recommendations). This isolates the model's performance from confounding factors like seasonality or user behavior changes, providing a causal estimate of its effectiveness.

Exam trap

Google Cloud often tests the misconception that high offline accuracy guarantees online success, ignoring that offline metrics can be misleading due to distribution shift, feedback loops, or mismatched optimization objectives (e.g., accuracy vs. CTR).

820
MCQmedium

Refer to the exhibit. A data scientist runs the above BigQuery ML query to create a logistic regression model. After training, the model is evaluated using ML.EVALUATE. The evaluation shows poor performance with high bias. Which action would most likely improve the model's performance?

A.Remove the TRANSFORM clause and use raw features.
B.Change the model_type to 'linear_reg'.
C.Add more complex features by including polynomial expansions.
D.Increase the number of training iterations by setting MAX_ITERATIONS.
AnswerC

Polynomial expansions increase model complexity, allowing it to learn non-linear patterns from the data, which addresses high bias.

Why this answer

High bias indicates the model is underfitting the data, meaning it is too simple to capture underlying patterns. Adding polynomial expansions (feature crosses) in the TRANSFORM clause increases model complexity, allowing the logistic regression to learn non-linear decision boundaries, which directly addresses underfitting.

Exam trap

Google Cloud often tests the distinction between bias and variance; the trap here is that candidates might confuse high bias (underfitting) with high variance (overfitting) and incorrectly choose to simplify the model or increase iterations, rather than adding complexity.

How to eliminate wrong answers

Option A is wrong because removing the TRANSFORM clause would discard any feature preprocessing, likely making the model even simpler and worsening high bias. Option B is wrong because changing model_type to 'linear_reg' would switch to a regression task, which is inappropriate for classification and does not address bias in a logistic regression model. Option D is wrong because increasing MAX_ITERATIONS only affects convergence of the optimization algorithm; if the model is too simple (high bias), more iterations will not help it learn more complex patterns.

821
MCQmedium

A data scientist is using Vertex AI Experiments to track training runs. They want to automatically log all hyperparameters, metrics, and model artifacts without modifying their training code. Which approach should they use?

A.Use Cloud Logging to capture training logs and parse them into experiments
B.Manually log parameters and metrics using the Vertex AI SDK in the training script
C.Run training in Vertex AI Workbench and manually save metrics to a BigQuery table
D.Enable autologging with MLflow and use Vertex AI Experiments as the tracking server
AnswerD

Autologging automatically captures parameters, metrics, and artifacts without modifying training code.

Why this answer

Vertex AI Experiments supports autologging with MLflow integration. By using the MLflow tracking API or Vertex AI's autologging, parameters and metrics are captured without code changes.

822
MCQhard

A company uses Vertex AI Matching Engine for a product recommendation system. They need to update the index with new product embeddings every hour, but the index is used for online queries with low latency. Which index update strategy should they use?

A.Use streaming updates to insert new embeddings incrementally
B.Use a hybrid approach with batch for daily full rebuild and streaming for hourly
C.Use batch updates to replace the index every hour
D.Recreate the index from scratch each hour
AnswerA

Streaming updates allow incremental, real-time changes while the index serves queries.

Why this answer

Streaming updates in Vertex AI Matching Engine allow incremental insertion of new embeddings into an existing index without rebuilding it. This satisfies the requirement for hourly updates while maintaining low-latency online queries, as the index remains available and consistent during the update process.

Exam trap

Cisco often tests the misconception that batch updates are required for consistency or that streaming updates cannot handle frequent changes, leading candidates to choose hybrid or batch approaches when incremental streaming is both sufficient and optimal for low-latency online serving.

How to eliminate wrong answers

Option B is wrong because a hybrid approach with batch for daily full rebuild and streaming for hourly adds unnecessary complexity and cost; streaming updates alone suffice for hourly increments without needing a daily rebuild. Option C is wrong because batch updates replace the entire index, causing downtime or increased latency during the rebuild, which violates the low-latency online query requirement. Option D is wrong because recreating the index from scratch each hour is inefficient, time-consuming, and disrupts query availability, making it unsuitable for real-time serving.

823
MCQhard

You are using Vertex AI Matching Engine (Vector Search) to serve similarity search for an e-commerce product recommendation system. The index is updated daily with new product embeddings via a batch job. However, you notice that some new products are not appearing in the search results for up to 24 hours. You need to ensure that new products are discoverable within 1 hour of ingestion. What should you do?

A.Switch to a streaming update configuration for the index.
B.Deploy multiple index endpoints and use traffic splitting.
C.Increase the frequency of the batch update job to run every 30 minutes.
D.Use a brute-force index instead of an ANN index.
AnswerA

Streaming updates allow near real-time index updates, making new products searchable within minutes.

Why this answer

Option A is correct because Vertex AI Matching Engine supports streaming updates to the index, which allows new embeddings to be added in near real-time (typically within minutes) without requiring a full batch rebuild. By switching to a streaming update configuration, new product embeddings become searchable within the 1-hour requirement, as the index is updated incrementally as data arrives.

Exam trap

The trap here is that candidates may assume increasing batch frequency (Option C) is sufficient, but they overlook that batch updates in Vertex AI Matching Engine are designed for daily or less frequent rebuilds and cannot guarantee sub-hour freshness due to build and deployment overhead, whereas streaming updates are the intended solution for low-latency index updates.

How to eliminate wrong answers

Option B is wrong because deploying multiple index endpoints with traffic splitting does not address the latency of index updates; it only distributes query load across existing index versions, which still rely on the same batch update schedule. Option C is wrong because increasing the batch job frequency to every 30 minutes still introduces a delay of up to 30 minutes plus processing time, and batch updates are not designed for sub-hour freshness; they also require rebuilding the entire index, which is resource-intensive and may not meet the 1-hour SLA consistently. Option D is wrong because using a brute-force index (exact nearest neighbor) instead of an ANN index does not change the update mechanism; it only affects search accuracy and performance, not the freshness of data ingestion.

824
MCQmedium

An organization wants to trigger a Vertex AI pipeline whenever a new commit is pushed to the main branch of their Cloud Source Repository. The pipeline should retrain and evaluate the model. Which service should they use to detect the push event and start the pipeline?

A.Vertex AI Pipeline schedule
B.Cloud Build trigger
C.Cloud Scheduler on a short interval
D.Pub/Sub with push subscription to a Cloud Function
AnswerB

Cloud Build triggers directly respond to push events and can invoke pipelines.

Why this answer

Cloud Build triggers are designed to automatically invoke a build pipeline in response to events from Cloud Source Repository, such as a push to a specific branch. This allows the organization to directly start a Vertex AI pipeline for retraining and evaluation without additional infrastructure. Cloud Build triggers natively integrate with Cloud Source Repository, making them the simplest and most reliable choice for this event-driven workflow.

Exam trap

Cisco often tests the distinction between event-driven triggers (Cloud Build) and time-based schedulers (Cloud Scheduler, Vertex AI Pipeline schedule), leading candidates to mistakenly choose a polling or cron-based option for an event-driven requirement.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipeline schedules are time-based (cron) triggers, not event-driven; they cannot detect a Git push event. Option C is wrong because Cloud Scheduler on a short interval would poll for changes, introducing latency and inefficiency, and it does not natively detect push events from Cloud Source Repository. Option D is wrong because while Pub/Sub with a push subscription to a Cloud Function could technically work, it adds unnecessary complexity and an extra compute layer; Cloud Build triggers provide a direct, managed integration without the need for custom code or additional services.

825
Multi-Selecthard

A fintech company needs to deploy a TensorFlow model for real-time fraud detection with strict latency SLO (p99 < 100ms). They expect variable traffic with spikes. They also want to minimize cold-start latency. Which two configurations should they use? (Choose 2)

Select 2 answers
A.Set min_replicas = 0 to allow scale-to-zero and save costs.
B.Use a GPU-enabled machine type (e.g., N1 with T4) to accelerate inference.
C.Set min_replicas = 3 to keep a baseline of warm instances.
D.Enable Vertex AI Model Optimization for automatic quantization.
E.Use batch prediction instead of online prediction.
AnswersB, C

GPUs can reduce inference latency for deep learning models.

Why this answer

Option B is correct because GPU-enabled machine types (e.g., N1 with T4) significantly accelerate TensorFlow model inference, which is critical for meeting the p99 < 100ms latency SLO. GPUs parallelize matrix operations common in deep learning models, reducing per-request latency even under variable traffic spikes.

Exam trap

Cisco often tests the misconception that scale-to-zero (min_replicas = 0) is always cost-effective, but in latency-sensitive real-time inference, it introduces unacceptable cold-start delays, making baseline warm instances (min_replicas > 0) essential.

Page 10

Page 11 of 14

Page 12