CCNA Operationalizing machine learning models Questions

75 of 191 questions · Page 1/3 · Operationalizing machine learning models · Answers revealed

1
MCQmedium

Your team uses a CI/CD pipeline with Cloud Build to train and deploy ML models on Vertex AI. You want to ensure that only models that pass validation checks (e.g., accuracy threshold, fairness metrics) are promoted to production. What is the best way to implement this?

A.Use Cloud Scheduler to trigger retraining and only deploy if the new model outperforms the previous one on a holdout set.
B.Use Vertex AI Model Registry's automatic promotion feature that moves models to production based on evaluation results.
C.Configure Cloud Functions to re-evaluate the model daily and promote if it passes.
D.In the Cloud Build pipeline, after training, run validation scripts. If validation passes, deploy to a staging endpoint for manual approval, then promote to production.
AnswerD

This ensures automated validation before any deployment, with optional manual gate for production.

Why this answer

Option D is correct because it integrates validation directly into the CI/CD pipeline using Cloud Build, ensuring that only models passing specific checks (e.g., accuracy threshold, fairness metrics) are promoted. By running validation scripts after training and requiring manual approval before production promotion, this approach provides both automated gatekeeping and human oversight, aligning with MLOps best practices for safe model deployment.

Exam trap

Google Cloud often tests the misconception that Vertex AI Model Registry has built-in automatic promotion based on evaluation metrics, but in reality, it requires external orchestration (like Cloud Build) to implement such logic.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler triggers retraining on a schedule, not based on validation results, and it does not integrate with the CI/CD pipeline to enforce promotion gates. Option B is wrong because Vertex AI Model Registry does not have an automatic promotion feature based on evaluation results; it stores and manages models but requires external logic to decide promotion. Option C is wrong because Cloud Functions re-evaluating the model daily is reactive and does not tie into the build pipeline's validation step, potentially promoting a model that was not validated at the time of training.

2
MCQeasy

Refer to the exhibit. An auditor sees the following output from `gcloud ai models list`. What can they conclude about versioning?

A.The model is deployed on a single endpoint
B.The model has two versions with v2 being the latest
C.Only the latest version is available
D.The model is automatically scaled
AnswerB

Two distinct versions are shown; v2 has a later timestamp.

Why this answer

The `gcloud ai models list` output shows two model versions (v1 and v2) under the same model resource. The default traffic split or the listed order indicates v2 is the latest version. This directly confirms that the model has two versions, with v2 being the latest, making option B correct.

Exam trap

Google Cloud often tests that candidates confuse model versioning with endpoint deployment details, leading them to assume a single endpoint or automatic scaling from a model list output that contains no such information.

How to eliminate wrong answers

Option A is wrong because the output does not show any endpoint information; model versions can be deployed to multiple endpoints or not deployed at all. Option C is wrong because the output explicitly lists two versions (v1 and v2), so both are available, not just the latest. Option D is wrong because the output provides no scaling configuration or metrics; autoscaling is a deployment setting, not a model version property.

3
MCQmedium

A company has a production model deployed on Vertex AI that shows declining accuracy over time. The model uses features from a BigQuery feature store. The data science team suspects data drift. What is the most efficient way to monitor and detect drift for this model?

A.Enable Vertex AI Model Monitoring on the endpoint to automatically detect skew and drift
B.Periodically export training data and production data to CSV and compare distributions manually
C.Create a scheduled retraining pipeline that runs weekly
D.Set up Cloud Monitoring dashboards to track prediction request volumes and error rates
AnswerA

Vertex AI Model Monitoring provides built-in drift detection for deployed models.

Why this answer

Option B is correct because Vertex AI Model Monitoring can automatically monitor prediction input data for drift and send alerts. Option A (manual comparison) is not efficient. Option C (Cloud Monitoring dashboards) can show metrics but not automatically detect drift.

Option D (retraining pipeline) is reactive, not proactive monitoring.

4
Multi-Selecthard

Which TWO metrics are most important to monitor for a real-time online prediction system to ensure service reliability and model performance?

Select 2 answers
A.Feature distribution skew between training and serving
B.Prediction latency (p50, p99)
C.Number of training examples used for the latest model version
D.Batch prediction job throughput
E.Prediction error rate (e.g., 4xx/5xx responses)
AnswersB, E

Latency is critical for real-time applications; p99 shows tail performance.

Why this answer

Prediction latency (p50, p99) is critical because it directly impacts user experience and system reliability; high tail latency (p99) can indicate resource contention or model complexity issues. Prediction error rate (4xx/5xx) is essential for detecting serving infrastructure failures, such as model server crashes or misconfigured endpoints, which degrade service reliability. Both metrics provide real-time visibility into the serving layer's health and performance, distinct from offline training metrics.

Exam trap

Google Cloud often tests the distinction between offline training metrics (like feature skew or training example count) and real-time serving metrics (like latency and error rate), trapping candidates who confuse model performance monitoring with service reliability monitoring.

5
MCQmedium

A team trained a model on a Vertex AI custom training job and wants to deploy it to an endpoint for online predictions. They have the model artifacts stored in Cloud Storage. What steps are required?

A.Upload model to Model Registry, create endpoint, deploy model
B.Directly deploy from Cloud Storage without Model Registry
C.Create endpoint, then upload model
D.Use Vertex AI Batch Prediction only
AnswerA

This is the standard workflow: register model, create endpoint, then deploy.

Why this answer

To deploy a model for online predictions on Vertex AI, you must first upload the model artifacts from Cloud Storage to the Model Registry, which creates a versioned model resource. Then you create an endpoint (or use an existing one) and deploy the model to that endpoint, specifying machine type, traffic split, and other settings. This three-step process (upload → create endpoint → deploy) is the required workflow for online serving.

Exam trap

Google Cloud often tests the misconception that you can deploy directly from Cloud Storage without the Model Registry, or that the endpoint must be created before the model is uploaded, when in fact the model must be registered first.

How to eliminate wrong answers

Option B is wrong because Vertex AI does not allow direct deployment from Cloud Storage without first registering the model in the Model Registry; the registry is required to manage model versions and associate deployment configurations. Option C is wrong because you cannot create an endpoint before uploading the model to the Model Registry, as the endpoint deployment references a model resource that must already exist. Option D is wrong because the question explicitly asks for online predictions, and batch prediction is a separate, asynchronous process that does not involve endpoints or real-time serving.

6
Multi-Selecthard

A data science team uses Cloud Build and Vertex AI to implement CI/CD for their machine learning models. Which THREE steps are essential for a production-ready operationalization pipeline? (Choose 3.)

Select 3 answers
A.Store all training artifacts in Cloud Storage without versioning.
B.Deploy the model to a staging endpoint for manual approval before promoting to production.
C.Automatically deploy every new model version directly to the production endpoint.
D.Use Vertex AI Model Evaluation to validate the new model against the current production model metrics.
E.Include unit and integration tests for the training code in the Cloud Build pipeline.
AnswersB, D, E

Staging allows human review and canary testing before full production rollout.

Why this answer

Option B is correct because deploying to a staging endpoint for manual approval before promoting to production is a critical step in a production-ready CI/CD pipeline. This allows data scientists to validate model behavior, performance, and fairness in a near-production environment, preventing regressions and ensuring governance compliance before the model serves live traffic.

Exam trap

Google Cloud often tests the misconception that full automation (Option C) is always better, but the trap here is that production-ready pipelines require human-in-the-loop approval for critical model changes to ensure accountability and safety.

7
MCQmedium

A production model deployed on Vertex AI Endpoint is experiencing high latency during traffic spikes. The current configuration uses a single replica. What is the most efficient solution?

A.Set a higher min replica count (e.g., 3)
B.Enable autoscaling with minReplicaCount=1 and maxReplicaCount=10
C.Use a larger machine type (e.g., n1-highmem-8)
D.Switch to batch prediction to handle spikes
AnswerB

Autoscaling adjusts replicas based on load, balancing latency and cost.

Why this answer

Enabling autoscaling with a min replica count ensures always-on capacity and scales up during spikes. Using a larger machine type might help but is less dynamic. Using batch prediction doesn't solve real-time latency.

Increasing min replicas without autoscaling leaves resources idle at quiet times.

8
Multi-Selectmedium

Which TWO configurations are required to enable online prediction for a model deployed on Vertex AI Endpoints?

Select 2 answers
A.A feature store must be attached to the endpoint.
B.The endpoint must be configured with a machine type (e.g., n1-standard-2).
C.The model must be trained on Vertex AI.
D.A model must be deployed to an endpoint.
E.Autoscaling must be enabled.
AnswersB, D

A machine type must be specified to allocate resources for serving.

Why this answer

Option B is correct because Vertex AI Endpoints require a machine type to be specified when deploying a model. The machine type determines the compute resources (CPU/memory) allocated to the serving container, which is essential for handling prediction requests. Without a machine type, the endpoint cannot provision the underlying infrastructure to serve online predictions.

Exam trap

The trap here is that candidates often confuse optional features (like Feature Store or autoscaling) with mandatory configurations, or assume the model must be trained on Vertex AI, when in fact only the machine type and model deployment are strictly required for online prediction.

9
MCQhard

A data science team is operationalizing a batch prediction job using Vertex AI Batch Prediction. The model uses a custom container that requires a specific GPU for inference. The job processes a large dataset stored in Cloud Storage. The team wants to minimize cost while ensuring the job completes within a 2-hour window. Which configuration should they choose?

A.Use a custom training job with a GPU worker pool and run the inference as a custom job.
B.Use a custom machine type with a GPU accelerator in the batch prediction request.
C.Use a high-memory machine type (e.g., n1-highmem-32) without GPU to reduce cost.
D.Configure a Vertex AI endpoint with GPU and submit batch requests to the endpoint.
AnswerA

This approach allows GPU usage and is cost-effective for batch processing within a time window.

Why this answer

Option A is correct because Vertex AI Batch Prediction does not support custom containers with GPU accelerators; it only supports CPUs for batch prediction jobs. To run GPU-accelerated inference on a large dataset, the team must use a custom training job (which supports GPU worker pools) and run inference as a custom job. This approach allows them to leverage GPU hardware for the 2-hour window while minimizing cost by using preemptible VMs or choosing the smallest GPU instance that meets throughput requirements.

Exam trap

Google Cloud often tests the misconception that Vertex AI Batch Prediction supports GPU accelerators because it is a managed service, but in reality, GPU support is only available for online prediction endpoints and custom training jobs, not for batch prediction.

How to eliminate wrong answers

Option B is wrong because Vertex AI Batch Prediction does not allow attaching GPU accelerators to custom machine types; the batch prediction service only supports CPU-based machine types. Option C is wrong because a high-memory CPU-only machine type would likely be too slow for GPU-required inference, causing the job to exceed the 2-hour window or require many more instances, increasing cost. Option D is wrong because configuring an endpoint with GPU and submitting batch requests would incur ongoing endpoint deployment costs (even when idle) and is designed for online prediction, not cost-efficient batch processing; it also introduces unnecessary latency and scaling complexity.

10
MCQhard

A team is implementing CI/CD for their ML models using Google Cloud. They want to automatically retrain and deploy a new model version when new training data arrives in Cloud Storage. Which combination of services should they use?

A.Cloud Storage triggers, Cloud Functions, and Vertex AI Pipelines
B.Cloud Scheduler and Vertex AI Training
C.Cloud Pub/Sub and Cloud Composer
D.Cloud Storage notifications and Cloud Build
AnswerA

Event-driven pipeline with managed ML services.

Why this answer

Option A is correct because Cloud Storage triggers fire an event when new data arrives, which invokes a Cloud Function that can start a Vertex AI Pipeline for retraining and deploying the model. This combination provides a fully managed, event-driven CI/CD pipeline for ML models without manual intervention.

Exam trap

Google Cloud often tests the distinction between event-driven triggers (Cloud Storage triggers) and time-based scheduling (Cloud Scheduler), leading candidates to choose B or D when they overlook the need for automatic retraining upon data arrival.

How to eliminate wrong answers

Option B is wrong because Cloud Scheduler is for time-based scheduling, not event-driven triggers from Cloud Storage, so it cannot automatically retrain when new data arrives. Option C is wrong because Cloud Pub/Sub and Cloud Composer (Apache Airflow) are more suited for complex workflow orchestration with multiple dependencies, not a simple event-driven retraining trigger from Cloud Storage. Option D is wrong because Cloud Build is designed for building and testing application code, not for orchestrating ML training pipelines with Vertex AI, and it lacks native integration for model deployment.

11
MCQmedium

Refer to the exhibit. What is the most likely cause of the error?

A.The model artifact was not uploaded to Cloud Storage
B.The endpoint does not exist
C.The service account lacks permissions
D.The model ID is invalid
AnswerA

The error explicitly states the artifact URI is missing.

Why this answer

The error occurs because the model artifact must be uploaded to Cloud Storage before it can be deployed to an endpoint. Vertex AI requires the model to be stored in a Cloud Storage bucket, and the deployment process references that artifact. Without the artifact in Cloud Storage, the endpoint creation or model deployment fails with an error indicating the resource is missing.

Exam trap

Google Cloud often tests the distinction between resource existence errors (like missing artifact) and permission or configuration errors, leading candidates to incorrectly choose permission issues when the actual problem is a missing prerequisite resource.

How to eliminate wrong answers

Option B is wrong because if the endpoint did not exist, the error would typically be a 404 Not Found or a message stating the endpoint resource is not found, not a generic error about missing artifact. Option C is wrong because a lack of permissions would result in a 403 Forbidden error or an IAM-related message, not an error about a missing model artifact. Option D is wrong because an invalid model ID would produce an error like 'Model not found' or 'Invalid model ID', not an error indicating the artifact is missing from Cloud Storage.

12
MCQeasy

A data science team has trained a TensorFlow model for image classification and wants to deploy it to production with minimal latency. They have already exported the model as a SavedModel directory. Which service should they use to create an online prediction endpoint?

A.Cloud Functions
B.Vertex AI Endpoints
C.AI Platform Prediction (legacy)
D.Cloud Dataflow
AnswerB

Vertex AI Endpoints provide scalable, low-latency online prediction serving.

Why this answer

Vertex AI Endpoints is the correct service for deploying a TensorFlow SavedModel to an online prediction endpoint with minimal latency. It provides managed, autoscaling infrastructure optimized for real-time inference, including GPU/TPU support, request batching, and automatic health checking, which are essential for production deployment.

Exam trap

The trap here is that candidates may confuse Vertex AI Endpoints with AI Platform Prediction (legacy) or think Cloud Functions can serve models, but Cisco tests that Vertex AI is the modern, fully managed service for online prediction with minimal latency, while the others are either deprecated or designed for different workloads.

How to eliminate wrong answers

Option A is wrong because Cloud Functions is a serverless compute service for event-driven, short-lived functions, not designed for hosting persistent ML models with low-latency prediction endpoints; it lacks built-in model serving, batching, and autoscaling for inference workloads. Option C is wrong because AI Platform Prediction (legacy) is the older, deprecated service that has been replaced by Vertex AI; while it could serve models, it is no longer the recommended or supported path for new deployments, and Vertex AI offers superior latency optimization and integration. Option D is wrong because Cloud Dataflow is a batch and stream data processing service based on Apache Beam, intended for ETL and data pipelines, not for hosting online prediction endpoints; it cannot serve real-time inference requests with sub-second latency.

13
Multi-Selecteasy

A team is deploying a TensorFlow model for online predictions on AI Platform Prediction. They want to monitor for data drift and model performance degradation. Which TWO Google Cloud services should they use?

Select 2 answers
A.Cloud Composer
B.AI Platform Continuous Evaluation
C.Cloud Monitoring
D.AI Platform Pipelines
E.Cloud Logging
AnswersB, C

Provides automated drift detection and model evaluation.

Why this answer

AI Platform Continuous Evaluation (option B) is correct because it is a managed service specifically designed to detect data drift and model performance degradation in deployed models. It automatically compares incoming prediction data against the training data distribution and monitors metrics like accuracy over time, triggering alerts when significant drift is detected. Cloud Monitoring (option C) is correct because it provides the underlying metrics and alerting infrastructure that can track model performance indicators (e.g., prediction latency, error rates) and integrate with Continuous Evaluation for comprehensive observability.

Exam trap

Google Cloud often tests the distinction between services that orchestrate pipelines (Composer, Pipelines) versus services that monitor and evaluate deployed models (Continuous Evaluation, Monitoring), leading candidates to mistakenly choose orchestration tools for monitoring tasks.

14
Multi-Selectmedium

A team is debugging a sudden increase in prediction latency for a model deployed on Vertex AI Endpoints. Which TWO metrics in Cloud Monitoring should they examine first? (Choose two.)

Select 2 answers
A.CPU utilization
B.Memory utilization
C.gRPC port errors
D.Number of predictions
E.Prediction request latency
AnswersA, B

High CPU utilization can cause processing delays.

Why this answer

CPU utilization (A) is correct because a sudden increase in prediction latency often stems from the model consuming excessive CPU cycles during inference, especially for compute-intensive models like deep neural networks. Monitoring CPU utilization helps identify whether the endpoint's compute resources are saturated, causing requests to queue and latency to spike. Memory utilization (B) is correct because insufficient memory can lead to swapping or garbage collection pauses, directly increasing latency.

Vertex AI Endpoints autoscales based on these metrics, so examining them first pinpoints resource bottlenecks.

Exam trap

Google Cloud often tests the distinction between symptom metrics (like prediction request latency) and root-cause metrics (like CPU/memory utilization), trapping candidates who select the symptom as a diagnostic metric instead of the underlying resource indicators.

15
MCQeasy

A team wants to retrain a model weekly using new data stored in BigQuery. They want to minimize manual effort. Which approach should they use?

A.Use Cloud Scheduler to trigger a Cloud Function that retrains
B.Retrain manually in a notebook each week
C.Use Cloud Composer to orchestrate retraining
D.Create a Vertex AI Pipeline scheduled via Cloud Scheduler
AnswerD

Pipelines automate retraining end-to-end.

Why this answer

Vertex AI Pipelines allow you to define a repeatable, automated ML workflow that can be triggered on a schedule via Cloud Scheduler. This minimizes manual effort by handling data extraction from BigQuery, model retraining, and deployment without human intervention, while also providing versioning and monitoring capabilities.

Exam trap

Google Cloud often tests the distinction between simple scheduling (Cloud Scheduler + Cloud Function) and full ML orchestration (Vertex AI Pipelines), where candidates mistakenly choose the simpler option without considering the need for a managed, scalable ML workflow.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler triggering a Cloud Function is suitable for lightweight tasks, but retraining a model typically requires more complex orchestration, dependency management, and resource handling that a Cloud Function alone cannot efficiently provide. Option B is wrong because manual retraining in a notebook each week introduces significant manual effort and is error-prone, directly contradicting the goal of minimizing manual effort. Option C is wrong because Cloud Composer (based on Apache Airflow) is a powerful orchestration tool but is overkill for a simple weekly retraining schedule; it adds unnecessary complexity and cost compared to a Vertex AI Pipeline scheduled via Cloud Scheduler.

16
MCQhard

Refer to the exhibit. A data engineer sees these metrics from Cloud Monitoring for a deployed Vertex AI Endpoint. What is the most effective action to reduce latency?

A.Switch to batch prediction
B.Increase the number of replicas
C.Reduce the machine type
D.Enable model quantization
AnswerB

Adding replicas scales horizontally, reducing load per replica and improving latency.

Why this answer

The metrics show high CPU utilization and increasing latency, indicating the current instance is overloaded. Increasing the number of replicas distributes the inference requests across multiple instances, reducing per-replica load and lowering response times. This is the most direct way to scale horizontally and address latency caused by resource saturation.

Exam trap

Google Cloud often tests the misconception that model optimization (quantization) or switching to batch mode is the primary fix for latency, when the metrics clearly point to a scaling bottleneck.

How to eliminate wrong answers

Option A is wrong because batch prediction is designed for asynchronous, large-scale processing and does not reduce real-time endpoint latency; it actually increases latency for individual requests. Option C is wrong because reducing the machine type would decrease compute capacity, worsening CPU saturation and increasing latency further. Option D is wrong because model quantization reduces model size and inference time per request but does not address the root cause of high concurrent load; it may help marginally but is less effective than scaling out replicas.

17
Multi-Selectmedium

Which THREE steps are required to set up a continuous training pipeline on Google Cloud using Vertex AI?

Select 3 answers
A.Run training on a single Compute Engine VM with a cron job.
B.Create a Vertex AI Pipeline to orchestrate data preprocessing, training, and model evaluation.
C.Set up a trigger (e.g., Cloud Scheduler or Cloud Build) to start training on a schedule or new data.
D.Manually upload the model to Vertex AI Model Registry after each training run.
E.Configure model evaluation and promotion rules (e.g., if accuracy > threshold, deploy to endpoint).
AnswersB, C, E

Pipeline orchestrates the steps.

Why this answer

Option B is correct because Vertex AI Pipelines provide a managed, repeatable, and scalable way to orchestrate the entire ML workflow, including data preprocessing, training, and model evaluation. This is essential for a continuous training pipeline, as it automates the sequence of steps and ensures consistency across runs.

Exam trap

Google Cloud often tests the distinction between manual, ad-hoc automation (like cron jobs) and fully managed, integrated orchestration services (like Vertex AI Pipelines), leading candidates to incorrectly select simpler but non-scalable options.

18
Multi-Selecthard

Which TWO are common causes of prediction bias in a deployed machine learning model in production?

Select 2 answers
A.Model accuracy is too high.
B.Data drift between training and serving data distributions.
C.Model is overfitted to training data.
D.Low latency predictions.
E.Training-serving skew due to differences in feature engineering.
AnswersB, E

Changes in the real-world data distribution can cause the model to produce biased results.

Why this answer

Option B is correct because data drift refers to changes in the statistical properties of the input features between the training and serving environments. When the distribution of real-world data shifts (e.g., seasonal trends, user behavior changes), the model's predictions become biased even if the model itself hasn't changed. This is a primary cause of prediction bias in production ML systems.

Exam trap

Google Cloud often tests the distinction between training-time issues (like overfitting) and production-time causes (like data drift and training-serving skew), so candidates mistakenly select overfitting as a production bias cause.

19
MCQhard

A real-time recommendation system uses a custom container deployed on AI Platform Prediction. The model requires a large in-memory embedding lookup table that is loaded from Cloud Storage at startup. The current startup time is over 5 minutes, causing prediction requests to timeout. Which strategy would most effectively reduce startup time?

A.Increase the machine type to one with more memory and CPU.
B.Preload the embedding table into a persistent disk and attach it to the container.
C.Reduce the size of the embedding table by using a smaller embedding dimension or fewer categories.
D.Use a faster storage class for the Cloud Storage bucket, such as Standard instead of Nearline.
AnswerC

Smaller table loads faster, directly addressing startup time.

Why this answer

Option C is correct because cutting down the embedding table size reduces data to load. Option A might not reduce time significantly; B adds complexity; D may not be possible or effective. The core issue is loading a large file, so reduce its size.

20
MCQhard

Refer to the exhibit. The feature store 'my_fs' responds to offline queries but online serving requests fail. What is the most likely cause?

A.Create a new feature store with online serving enabled
B.Use Cloud Bigtable directly
C.Update the existing feature store to enable online serving
D.Re-import features into a new store
AnswerC

Online serving can be enabled by setting appropriate scaling configuration.

Why this answer

The feature store 'my_fs' responds to offline queries but not online serving requests, which indicates that online serving is not enabled for the feature store. In Vertex AI Feature Store, online serving requires a dedicated endpoint and underlying infrastructure (e.g., Bigtable) to serve low-latency requests. Updating the existing feature store to enable online serving (option C) is the correct fix, as it activates the necessary serving resources without recreating the store.

Exam trap

Google Cloud often tests the misconception that a feature store's offline and online serving are automatically coupled, leading candidates to think a new store or data re-import is required when online serving fails, rather than recognizing that online serving is an optional configuration that must be explicitly enabled on the existing store.

How to eliminate wrong answers

Option A is wrong because creating a new feature store with online serving enabled is unnecessary and wasteful; the existing store can be updated to enable online serving without data re-import. Option B is wrong because using Cloud Bigtable directly bypasses the feature store's managed serving layer, losing integration with Vertex AI's serving APIs, monitoring, and consistency guarantees. Option D is wrong because re-importing features into a new store does not address the root cause—the existing store simply needs its online serving configuration enabled, not a full data migration.

21
MCQeasy

A company wants to version its ML models and track lineage from training data to deployed model. Which Google Cloud service should they use?

A.Cloud Storage with object versioning
B.Data Catalog
C.Artifact Registry
D.Vertex AI ML Metadata
AnswerD

ML Metadata tracks artifacts, lineage, and metadata for ML models.

Why this answer

Option B is correct because Vertex AI ML Metadata manages lineage and artifacts. Option A (Cloud Storage) is for storage only. Option C (Artifact Registry) is for container images, not ML models.

Option D (Data Catalog) is for data discovery.

22
MCQeasy

Refer to the exhibit. What is the most likely cause?

A.The model container does not support this prediction route
B.The request format is incorrect
C.The model was built for batch prediction only
D.The endpoint ID is wrong
AnswerA

The error indicates the prediction method is not supported by the model, likely due to container configuration.

Why this answer

Option D is correct: the error 'model type is not supported for this prediction method' suggests the model container does not support the online prediction route (e.g., it expects a different protocol or is designed for batch only). Option A is wrong because the endpoint ID is likely correct; the error is model-specific. Option B is wrong because the request format might be correct but the model rejects it.

Option C is wrong because batch-only models would show a different error; this error indicates the model's container doesn't handle the request.

23
MCQmedium

A retail company is using a machine learning model for inventory forecasting. They observe that the model's predictions become less accurate over time, especially during holiday seasons. Which monitoring metric should they prioritize?

A.Model latency
B.Prediction counts
C.Resource utilization
D.Prediction drift (feature drift)
AnswerD

Monitoring feature drift helps detect when training data distribution shifts, leading to accuracy loss.

Why this answer

Prediction drift (feature drift) is the correct metric because it directly measures changes in the input data distribution over time, which is the root cause of degrading model accuracy during holiday seasons. When customer behavior shifts (e.g., buying patterns during holidays), the features the model relies on drift, causing predictions to become less accurate. Monitoring prediction drift allows the team to detect when retraining or updating the model is necessary.

Exam trap

Google Cloud often tests the misconception that model latency or resource utilization are the primary concerns for accuracy degradation, when in fact drift monitoring is the key metric for detecting data shifts that cause performance decay.

How to eliminate wrong answers

Option A is wrong because model latency measures the time taken for a single prediction, which is unrelated to accuracy degradation over time. Option B is wrong because prediction counts track the volume of predictions made, not the quality or drift of those predictions. Option C is wrong because resource utilization (CPU, memory, etc.) monitors infrastructure health, not model performance or data distribution shifts.

24
Multi-Selecthard

Which THREE considerations are important when designing a batch prediction pipeline for a large dataset on Vertex AI?

Select 3 answers
A.Batch prediction automatically uses GPUs if the model framework requires them
B.Batch prediction requires a dedicated real-time endpoint
C.Choosing the appropriate machine type (e.g., n1-standard-16) balances cost and throughput
D.Large input files can be split into multiple smaller files to improve parallelism
E.Input data should be in Cloud Storage in a format supported by Vertex AI (e.g., JSONL, TFRecord)
AnswersC, D, E

Machine type impacts performance and cost.

Why this answer

Option C is correct because selecting the appropriate machine type, such as n1-standard-16, directly impacts the cost-performance trade-off in batch prediction. Vertex AI batch prediction jobs run on Compute Engine instances, and choosing a machine type with more vCPUs and memory can increase throughput for large datasets, but also raises cost. The key is to match the machine type to the model's computational needs and the data volume, avoiding over-provisioning while ensuring the job completes within acceptable time.

Exam trap

Google Cloud often tests the misconception that batch prediction requires a real-time endpoint or automatically uses GPUs, when in fact batch prediction is a serverless, endpoint-free process that requires explicit machine type and GPU configuration.

25
MCQmedium

Your organization deploys multiple versions of the same model to Vertex AI Endpoint for A/B testing. You have a production model (v1) serving 90% of traffic and a candidate model (v2) serving 10%. After one week, you observe that v2 has a slightly lower AUC but significantly higher business metrics like click-through rate. The product team wants to gradually increase v2's traffic. However, you need to ensure that the overall prediction latency remains under 200 ms. Currently, the endpoint has 10 replicas for v1 and 2 replicas for v2. What is the best approach to roll out v2 while maintaining latency SLO?

A.Merge v2's model into v1 by retraining v1 with v2's architecture and deploy as a single model.
B.Immediately set v2 to serve 100% traffic and monitor latency; if it exceeds 200 ms, roll back.
C.Increase v2's traffic split by 10% each day while also adding replicas for v2 based on CPU utilization.
D.Use a separate endpoint for v2 and route traffic at the load balancer level.
AnswerC

Gradual increase with autoscaling ensures latency remains within bounds.

26
MCQmedium

Your team has implemented a CI/CD pipeline using Cloud Composer (Apache Airflow) to retrain a model every day. The pipeline reads new data from BigQuery, trains a model using Vertex AI Training, evaluates it, and if the accuracy improves, deploys it to a Vertex AI Endpoint. For the past week, the pipeline has been running successfully but no new model has been deployed because the evaluation accuracy never exceeds the previous model's accuracy. The training data volume has been consistent. You suspect that the model is not learning from the new data. What should you do?

A.Deploy the new model anyway and run an A/B test in production to see if it performs better online.
B.Examine the training data for any data quality issues such as missing values or label leakage.
C.Increase the training budget or number of training steps to allow the model to converge better.
D.Change the evaluation metric to a different one that may show improvement, such as F1 score instead of accuracy.
AnswerB

Data quality issues can prevent the model from learning meaningful patterns despite sufficient data volume.

27
MCQeasy

A data engineer needs to monitor model performance over time for drift detection. What tool is specifically designed for this?

A.Vertex AI Model Monitoring
B.Cloud Monitoring
C.Cloud Logging
D.BigQuery ML
AnswerA

Vertex AI Model Monitoring provides drift detection, skew detection, and alerts for deployed models.

Why this answer

Vertex AI Model Monitoring is specifically designed to detect prediction drift and feature skew in deployed machine learning models. It continuously analyzes serving data against training data distributions and alerts when statistical metrics (e.g., Jensen-Shannon divergence, L-infinity distance) exceed configured thresholds, making it the correct tool for drift detection in the context of operationalizing ML models.

Exam trap

Google Cloud often tests the distinction between general-purpose monitoring tools (Cloud Monitoring, Cloud Logging) and ML-specific monitoring services (Vertex AI Model Monitoring), trapping candidates who assume any monitoring tool can handle drift detection.

How to eliminate wrong answers

Option B (Cloud Monitoring) is wrong because it is a general-purpose infrastructure and application monitoring service for metrics, uptime, and alerting, not specialized for ML model drift detection. Option C (Cloud Logging) is wrong because it is a centralized log management and analysis service for storing and querying log data, not designed to compute statistical drift between training and serving distributions. Option D (BigQuery ML) is wrong because it is a service for creating and executing machine learning models using SQL queries in BigQuery, not a monitoring tool for detecting drift in already-deployed models.

28
MCQhard

A healthcare organization is deploying a model that processes protected health information (PHI). They need to ensure that the inference data is encrypted in transit and at rest, and access is audited. Which combination of services meets these requirements?

A.Cloud Run with VPC connector and Cloud KMS
B.Vertex AI Endpoints with IAM and Cloud Monitoring
C.Vertex AI Endpoints with VPC-SC, CMEK, and Cloud Audit Logs
D.AI Platform Prediction with Cloud Armor
AnswerC

VPC Service Controls protect against unauthorized data movement, CMEK for customer-managed encryption keys, and Cloud Audit Logs for compliance.

Why this answer

Option C is correct because VPC Service Controls (VPC-SC) provides data exfiltration protection and ensures inference data remains within a defined security perimeter, Customer-Managed Encryption Keys (CMEK) encrypt data at rest with keys controlled by the organization, and Cloud Audit Logs capture all access events for auditing. This combination directly addresses encryption in transit (via VPC-SC perimeter enforcement) and at rest (via CMEK), plus access auditing via Cloud Audit Logs.

Exam trap

Google Cloud often tests the misconception that IAM and Cloud Monitoring alone satisfy encryption and auditing requirements, but IAM controls access without encrypting data at rest, and Cloud Monitoring tracks performance metrics, not access logs; candidates must recognize that VPC-SC, CMEK, and Cloud Audit Logs are the specific services needed for encryption and auditing in ML inference.

How to eliminate wrong answers

Option A is wrong because Cloud Run with VPC connector and Cloud KMS does not provide a managed inference endpoint optimized for ML models; Cloud Run is a general-purpose compute service, and while VPC connector enables private networking and Cloud KMS manages encryption keys, it lacks the specific model hosting, scaling, and monitoring capabilities of Vertex AI Endpoints. Option B is wrong because Vertex AI Endpoints with IAM and Cloud Monitoring provides access control and performance monitoring but does not encrypt data at rest with customer-managed keys (CMEK) or enforce a data perimeter via VPC-SC; Cloud Monitoring logs metrics, not access events for auditing. Option D is wrong because AI Platform Prediction (now legacy) with Cloud Armor provides DDoS protection but does not encrypt data at rest with CMEK or provide VPC-SC perimeter controls; Cloud Armor operates at the network edge and does not address encryption or audit logging requirements.

29
MCQhard

Your MLOps pipeline uses Vertex AI Pipelines. You want to ensure that model training uses a consistent environment with specific Python package versions. Which approach best achieves this?

A.Include a requirements.txt file in the pipeline step and let Vertex AI install them
B.Use a pre-built deep learning container from Deep Learning Containers and install packages at runtime
C.Specify the Python version and package versions in the training job configuration
D.Build a custom container image with all dependencies and use it in the training step
AnswerD

Custom containers ensure exact same environment.

Why this answer

Option D is correct because building a custom container image with all dependencies ensures a fully deterministic and reproducible environment for model training. Vertex AI Pipelines executes each step as a container, so by pre-installing specific Python package versions into a custom image, you eliminate any risk of version drift or network issues during package installation at runtime. This approach aligns with MLOps best practices for environment consistency and is the most reliable method when exact package versions are critical.

Exam trap

Google Cloud often tests the distinction between runtime configuration (options A, B, C) and pre-built containerization (option D), trapping candidates who think specifying versions in a config file or installing at runtime is sufficient for full environment consistency in a pipeline context.

How to eliminate wrong answers

Option A is wrong because including a requirements.txt file and letting Vertex AI install them at runtime introduces variability; the installation may fail due to network issues, dependency conflicts, or changes in package repositories, and it does not guarantee the same environment across pipeline retries. Option B is wrong because using a pre-built deep learning container and installing packages at runtime still relies on runtime installation, which can lead to inconsistent environments if the installation process fails or if package versions are not pinned correctly. Option C is wrong because specifying Python version and package versions in the training job configuration only applies to AI Platform Training jobs, not to Vertex AI Pipelines; Vertex AI Pipelines runs steps as containers and does not natively support specifying package versions in the pipeline step configuration—the environment must be defined within the container image.

30
MCQmedium

Refer to the exhibit. An ML engineer sees this error when invoking a Vertex AI endpoint. What is the most likely cause?

A.The input format should be JSON
B.The model has a bug in the ResNet50 architecture
C.The model expects 128x128 images but raw input is 256x256
D.The endpoint is overloaded
AnswerC

The error shows expected shape [1,128,128,3] but got [1,256,256,3], indicating image size mismatch.

Why this answer

The error indicates a mismatch between the input dimensions expected by the model and the dimensions of the data being sent to the Vertex AI endpoint. ResNet50 models are commonly trained on 128x128 images, and if the raw input is 256x256, the endpoint will reject the request because the model's input tensor shape does not match. This is a typical input validation error in Vertex AI, where the serving infrastructure checks the shape of the prediction request against the model's signature.

Exam trap

Google Cloud often tests the distinction between input validation errors (e.g., shape mismatch) and model logic errors (e.g., architecture bugs), so candidates mistakenly attribute the error to a model bug or endpoint overload rather than a simple data preprocessing mismatch.

How to eliminate wrong answers

Option A is wrong because the error is about image dimensions, not the serialization format; Vertex AI endpoints accept JSON by default, and the error message would explicitly mention 'invalid format' if that were the issue. Option B is wrong because a bug in the ResNet50 architecture would cause inference errors or incorrect predictions, not a dimension mismatch error at the endpoint level. Option D is wrong because an overloaded endpoint would return a 429 HTTP status code or a 'resource exhausted' error, not a dimension mismatch error.

31
MCQeasy

Your company runs batch predictions using Vertex AI Batch Prediction on a monthly basis. The predictions are used to generate customer segments for marketing campaigns. This month, the batch prediction job failed with an error: 'The number of rows in the input table does not match the number of rows in the output table.' The input table in BigQuery has 5 million rows, but the output table has only 4.5 million rows. You need to identify and handle the missing predictions. What is the most efficient course of action?

A.Manually inspect the input table to find which rows are missing and rerun the batch prediction for those rows.
B.Run the batch prediction job with the 'generate_explanation' parameter enabled to get additional output for debugging.
C.Enable the 'write_prediction_errors' flag in the batch prediction configuration to capture failed predictions in a separate table.
D.Use a Cloud Dataflow pipeline to process the input data and call the model for each row, handling errors programmatically.
AnswerC

This flag causes failed predictions to be written to an error table, allowing you to identify and correct the problematic rows.

32
MCQeasy

You are using AI Platform Prediction (now Vertex AI) for online predictions. You notice that some requests are failing with a 503 status code. Which is the most likely cause?

A.The model is experiencing high traffic and the underlying nodes are still scaling up
B.The input data format does not match the model's expected schema
C.The project has exceeded its prediction requests quota
D.The service account used for prediction does not have the required permissions
AnswerA

503 errors often occur during scaling.

Why this answer

A 503 status code in Vertex AI (formerly AI Platform Prediction) indicates that the prediction service is temporarily unavailable, most commonly due to autoscaling latency. When a model receives a sudden spike in traffic, the underlying nodes (compute instances) may still be provisioning and initializing, causing requests to be rejected until the new nodes are ready to serve. This is a transient condition that resolves once scaling completes.

Exam trap

Google Cloud often tests the distinction between HTTP 503 (service unavailable, transient) and HTTP 429 (quota exceeded) or HTTP 400 (bad request), so candidates mistakenly attribute scaling issues to quota exhaustion or permission errors.

How to eliminate wrong answers

Option B is wrong because a mismatch in input data format (e.g., wrong tensor shape or feature names) would result in a 400 Bad Request error, not a 503. Option C is wrong because exceeding prediction request quota would return a 429 Too Many Requests error, not a 503. Option D is wrong because insufficient permissions (e.g., missing `aiplatform.predict` role) would cause a 403 Forbidden error, not a 503.

33
MCQeasy

A data scientist wants to test a new model version on a small percentage of traffic before full rollout. Which Vertex AI feature allows this?

A.A/B testing
B.Endpoint traffic splitting
C.Model monitoring
D.Model versioning with canary deployments
AnswerB

Traffic splitting allows routing a subset of requests to a different model version.

Why this answer

Vertex AI Endpoint traffic splitting allows you to route a specified percentage of inference requests to different model versions deployed on the same endpoint. This enables gradual rollout by directing a small fraction of traffic (e.g., 5%) to the new model while the rest goes to the current version, without needing separate endpoints or manual routing logic.

Exam trap

The trap here is that candidates confuse the conceptual practice of 'canary deployments' (Option D) with the specific Vertex AI feature 'endpoint traffic splitting' (Option B), but the exam expects the exact feature name as defined in the Google Cloud documentation.

How to eliminate wrong answers

Option A is wrong because A/B testing in Vertex AI is a feature for comparing model performance metrics (like accuracy or latency) by splitting traffic, but it is not the feature that directly enables traffic splitting itself—traffic splitting is the underlying mechanism, and A/B testing is a higher-level evaluation tool built on top of it. Option C is wrong because Model monitoring is used to detect data drift, feature skew, and prediction anomalies on deployed models, not to control traffic distribution between versions. Option D is wrong because model versioning with canary deployments is a conceptual practice, not a specific Vertex AI feature; the actual feature that implements canary-style traffic routing is endpoint traffic splitting, which is the correct answer.

34
MCQmedium

A retail company uses a Vertex AI endpoint to serve product recommendations. The model is a TensorFlow model deployed with a custom container. Recently, users have reported that recommendations are stale. The model is retrained daily using Vertex AI Pipelines. The pipeline completes successfully, but the endpoint continues to serve the old model. The team checks the pipeline logs and sees that the new model is uploaded to the Vertex AI Model Registry. The endpoint has traffic split set to 100% for the old model. The team needs to update the endpoint to serve the new model version. What should they do?

A.Check the pipeline for errors in the deployment step
B.Re-upload the model with a different version ID
C.Redeploy the same model to the endpoint
D.Update the endpoint to deploy the new model version from the registry and adjust traffic split
AnswerD

Explicitly deploy new version to endpoint.

Why this answer

Option D is correct because the pipeline successfully uploaded the new model to the Vertex AI Model Registry, but the endpoint still has its traffic split configured to 100% for the old model. To serve the new model, the team must explicitly update the endpoint to deploy the new model version from the registry and adjust the traffic split to route 100% of traffic to it. This is a standard operational step in Vertex AI: uploading a model does not automatically update the endpoint's deployment or traffic allocation.

Exam trap

Google Cloud often tests the misconception that uploading a new model version to the registry automatically updates the endpoint's serving configuration, when in fact the traffic split must be explicitly adjusted to route requests to the new model.

How to eliminate wrong answers

Option A is wrong because the pipeline logs show no errors in the deployment step; the model was successfully uploaded to the registry, so checking for errors is unnecessary and misdiagnoses the issue. Option B is wrong because re-uploading the model with a different version ID does not change the endpoint's deployment or traffic split; the endpoint still points to the old model version. Option C is wrong because redeploying the same model (the old version) to the endpoint would not serve the new model; the team needs to deploy the new model version from the registry, not redeploy the old one.

35
Multi-Selecthard

Which THREE factors should be considered when designing a Vertex AI Pipeline for continuous training?

Select 3 answers
A.Cost of training and infrastructure
B.Debugging tools like Cloud Debugger
C.Trigger mechanism (time-based or event-based)
D.Number of model versions to keep
E.Data freshness and staleness tolerance
AnswersA, C, E

Budget impacts resource selection.

Why this answer

Cost of training and infrastructure (A) is correct because Vertex AI Pipelines incur compute costs for each pipeline run, including training, data processing, and orchestration. Continuous training amplifies these costs, so you must consider budget constraints, resource optimization (e.g., using preemptible VMs), and cost monitoring to avoid unexpected bills.

Exam trap

Google Cloud often tests the distinction between operational pipeline design factors (triggers, cost, data freshness) and peripheral management tasks (versioning, debugging tools), leading candidates to incorrectly select options like D or B that are valid but not core to pipeline design.

36
MCQmedium

A company uses Vertex AI to serve a model. They notice that some predictions are incorrect due to data drift. What is the best way to detect and retrain the model automatically?

A.Store predictions in BigQuery and run scheduled queries
B.Create a Cloud Monitoring dashboard
C.Set up Cloud Logging metrics to monitor predictions
D.Use Vertex AI Model Monitoring with alerts and retraining pipeline
AnswerD

Monitors drift and triggers retraining.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect data drift and feature skew in production models. It can be configured to send alerts and trigger an automated retraining pipeline via Cloud Functions or Vertex AI Pipelines, enabling continuous model improvement without manual intervention. This directly addresses the need for automatic detection and retraining in response to data drift.

Exam trap

The trap here is that candidates may confuse general monitoring tools (Cloud Monitoring, Cloud Logging) with the specialized drift detection and automated retraining capabilities of Vertex AI Model Monitoring, assuming any monitoring solution can trigger retraining without native integration.

How to eliminate wrong answers

Option A is wrong because storing predictions in BigQuery and running scheduled queries is a manual, batch-oriented approach that does not provide real-time drift detection or automated retraining; it requires custom code and lacks native integration with Vertex AI's monitoring capabilities. Option B is wrong because Cloud Monitoring dashboards visualize metrics but do not inherently detect data drift or trigger retraining pipelines; they are for observability, not automated action. Option C is wrong because Cloud Logging metrics can track prediction logs but are not designed for statistical drift analysis (e.g., distribution comparisons) and cannot directly initiate retraining workflows without additional custom logic.

37
MCQmedium

You have deployed a classification model on Vertex AI Endpoints. The model's training data had a balanced class distribution, but over time, the production data has shifted such that one class appears 90% of the time. The model's overall accuracy remains high, but the recall for the minority class has dropped significantly. What is the best approach to detect and address this issue?

A.Retrain the model daily on the entire historical dataset
B.Set up Vertex AI Model Monitoring to detect skew and drift, and retrain using a sliding window of recent data
C.Increase the number of replicas on the endpoint to reduce latency
D.Adjust the decision threshold to improve minority class recall
AnswerB

Model Monitoring detects skew/drift; retraining on recent data adapts to new distribution.

Why this answer

Vertex AI Model Monitoring is specifically designed to detect skew and drift between training and serving data. In this scenario, the production data has shifted to 90% of one class, which is a clear case of data drift. By setting up monitoring, you can be alerted to this drift and then retrain the model using a sliding window of recent data, which adapts to the new distribution without requiring full retraining on the entire historical dataset.

This approach directly addresses the root cause—the shift in class distribution—rather than just treating symptoms.

Exam trap

Google Cloud often tests the distinction between monitoring/detection (Model Monitoring) and reactive fixes (threshold tuning), where candidates mistakenly choose a quick fix like adjusting the decision threshold instead of addressing the root cause of data drift.

How to eliminate wrong answers

Option A is wrong because retraining daily on the entire historical dataset is computationally expensive and does not prioritize recent data; it would still include the old balanced distribution, potentially diluting the model's ability to adapt to the new skewed production data. Option C is wrong because increasing the number of replicas on the endpoint reduces latency and improves throughput, but it does not address data drift or the drop in minority class recall; it is a scaling solution, not a monitoring or retraining solution. Option D is wrong because adjusting the decision threshold can improve recall for the minority class in the short term, but it does not fix the underlying model's inability to generalize to the shifted data distribution; it is a band-aid that may hurt precision and overall model performance.

38
MCQmedium

You run batch predictions using Vertex AI Batch Prediction on a tabular dataset. The job processes 1 million rows and takes 6 hours to complete. You need to reduce the processing time to under 2 hours without increasing cost significantly. What should you do?

A.Switch to a machine type with more CPU cores and vCPUs
B.Increase the machine count (number of worker replicas) in the batch prediction job
C.Downsample the dataset to 500k rows
D.Use Online prediction instead of batch
AnswerB

More workers process data in parallel, reducing runtime linearly with cost.

Why this answer

Correct: D. Increasing the number of worker replicas speeds up batch jobs. Option A is wrong because machine type may help but usually less effective than parallelization.

Option B is wrong because streaming is for online. Option C is wrong because reducing data size is not an option.

39
MCQeasy

A startup is deploying a PyTorch model on Google Cloud. They need to serve predictions for a mobile app with bursty traffic. Which service is most cost-effective?

A.Vertex AI Endpoints with autoscaling and a minimum of 0 replicas to scale down to zero
B.Vertex AI Endpoints with a minimum number of replicas
C.App Engine with manual scaling
D.Cloud Run with CPU always allocated
AnswerA

Scaling to zero minimizes cost when idle, ideal for bursty traffic.

Why this answer

Option D is correct because Vertex AI Endpoints with autoscaling and minimum replicas set to 0 can scale down to zero when idle, reducing cost. Option A has minimum replicas, incurring cost. Options B and C may not scale to zero or lack ML optimization.

40
MCQmedium

Refer to the exhibit. This log entry was generated by Vertex AI Model Monitoring for a production model. What should the data engineer do to address this issue?

A.Increase the drift threshold to 0.9 to suppress alerts
B.Retrain the model with more recent data
C.Deploy a new model version trained on the original dataset
D.Disable monitoring for the 'age' feature
AnswerB

Addresses the root cause by adapting to data shift.

Why this answer

Option B is correct because Vertex AI Model Monitoring detected a drift in the 'age' feature, indicating that the production data distribution has shifted from the training data. Retraining the model with more recent data aligns the model with the current data distribution, mitigating the drift and maintaining prediction accuracy. This is the standard remediation for model drift in production ML systems.

Exam trap

Google Cloud often tests the misconception that adjusting thresholds or disabling monitoring is a valid fix for drift, when the correct action is always to retrain the model with current data.

How to eliminate wrong answers

Option A is wrong because increasing the drift threshold to 0.9 would suppress alerts without addressing the underlying data drift, allowing the model to continue making inaccurate predictions. Option C is wrong because deploying a new model version trained on the original dataset would not resolve the drift; it would reuse the same outdated training data that no longer represents the current production distribution. Option D is wrong because disabling monitoring for the 'age' feature would hide the drift issue rather than fixing it, leaving the model vulnerable to degraded performance due to a drifted feature.

41
MCQhard

A financial services company uses a custom container on Vertex AI Prediction to serve a fraud detection model. The container runs a Flask app that loads a large feature engineering library (~2 GB) at startup. The model is updated weekly. For the past two weeks, the new model version has been failing health checks and showing 'Container failed to start' errors in the logs. The previous versions worked fine. You inspect the container image and confirm it is built correctly using Cloud Build. The only change in the latest build is an updated version of the feature engineering library. What is the most likely cause and how should you fix it?

A.The Cloud Build step that pushes the image is misconfigured. Rebuild using a different approach.
B.The Vertex AI endpoint machine type is too small for the new container. Upgrade to a larger machine type.
C.The new library version increased memory consumption during startup, exceeding the health check timeout. Increase the startup probe initial delay.
D.The new library has a dependency conflict that causes the Flask app to crash. Roll back to the previous library version.
AnswerC

A larger library could cause longer initialization; adjusting the health check timing accommodates that.

42
Matchingmedium

Match each Google Cloud IAM role to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Read access to BigQuery datasets and tables

Permission to run BigQuery jobs

Read access to Cloud Storage objects

Permissions for Dataflow worker nodes

Why these pairings

Predefined IAM roles relevant to data engineering.

43
MCQmedium

A data scientist is using Vertex AI to train a model and wants to ensure that the training code and environment are reproducible. Which approach should they take?

A.Use Jupyter notebooks on Vertex AI Workbench
B.Use Vertex AI Training with a pre-built container and specify the exact version of the framework
C.Use custom containers with fixed tags
D.Use Cloud Build to train the model
AnswerB

Pre-built containers with version pinning ensure consistent environment and code execution.

Why this answer

Option B is correct because pre-built containers with specific versions ensure the exact same environment across runs. Option A with custom containers is good but less standard. Options C and D are not best practices for reproducibility.

44
MCQmedium

A company deploys a machine learning model to Vertex AI for real-time predictions. After deployment, they notice that prediction latency spikes during peak traffic hours. Which approach should they take to reduce latency without sacrificing accuracy?

A.Configure auto-scaling with higher min and max instances
B.Reduce the number of input features
C.Switch from online to batch prediction
D.Use a larger machine type for the model
AnswerA

Auto-scaling handles traffic spikes.

Why this answer

Option A is correct because configuring auto-scaling with higher min and max instances ensures that Vertex AI has sufficient pre-warmed replicas to handle traffic spikes without cold-start latency. This approach maintains model accuracy because it does not alter the model architecture or inference logic, only the infrastructure capacity.

Exam trap

Google Cloud often tests the misconception that reducing features or using batch prediction is the primary way to reduce latency, but the real exam trap is that candidates overlook the need to maintain real-time capability and accuracy, and instead choose a solution that changes the model or prediction mode rather than scaling infrastructure.

How to eliminate wrong answers

Option B is wrong because reducing the number of input features may degrade model accuracy, and the question explicitly requires not sacrificing accuracy. Option C is wrong because switching from online to batch prediction eliminates real-time capability, which contradicts the requirement for real-time predictions. Option D is wrong because using a larger machine type can reduce latency but often increases cost and may introduce cold-start delays if scaling is not addressed; it does not directly solve latency spikes during peak traffic, and the question asks for a solution that does not sacrifice accuracy, which a larger machine type does not affect but is not the most targeted fix for traffic-induced latency.

45
MCQeasy

A data scientist has trained an XGBoost model on Vertex AI and wants to deploy it to an endpoint with automatic scaling based on traffic. What is the recommended deployment approach?

A.Export the model to a container and deploy on Cloud Run
B.Use AI Platform Prediction with batch prediction
C.Deploy the model as an API on App Engine
D.Use Vertex AI Endpoints with automatic scaling enabled
AnswerD

Vertex AI Endpoints support automatic scaling based on traffic, making it the recommended approach.

Why this answer

Vertex AI Endpoints with automatic scaling enabled is the recommended approach because it directly supports deploying trained models (including XGBoost) as online prediction endpoints with built-in autoscaling based on incoming traffic. This service manages the underlying infrastructure, load balancing, and scaling policies, aligning with the requirement for automatic scaling without additional containerization or serverless overhead.

Exam trap

Google Cloud often tests the distinction between online (real-time) and batch prediction services, and the trap here is that candidates may confuse Vertex AI Endpoints with generic serverless options like Cloud Run or App Engine, overlooking the fact that Vertex AI provides a purpose-built, managed endpoint service with native autoscaling for ML models.

How to eliminate wrong answers

Option A is wrong because exporting the model to a container and deploying on Cloud Run requires manual containerization and does not natively integrate with Vertex AI's model registry, versioning, or monitoring, and Cloud Run's scaling is based on request concurrency rather than the model-specific metrics Vertex AI provides. Option B is wrong because AI Platform Prediction with batch prediction is designed for offline, asynchronous predictions on large datasets, not for real-time online serving with automatic scaling based on live traffic. Option C is wrong because deploying the model as an API on App Engine introduces unnecessary complexity and lacks the optimized serving infrastructure, model versioning, and traffic splitting capabilities that Vertex AI Endpoints offer for ML models.

46
MCQhard

A model deployed on Vertex AI Endpoint is making predictions with high accuracy but the business team suspects bias against a certain demographic group. You need to analyze the model's predictions for fairness. What is the most effective approach?

A.Use Vertex AI Explainable AI to generate feature attributions for each prediction and analyze whether the demographic feature has disproportionate impact.
B.Compute overall fairness metrics by comparing prediction rates across demographic groups.
C.Collect more data for the under-represented group and retrain the model.
D.Use Vertex AI Model Monitoring to check for training-serving skew on the demographic feature.
AnswerA

Explanations help identify if a sensitive attribute is influencing predictions unfairly.

Why this answer

Vertex AI Explainable AI provides per-instance feature attributions, which allow you to examine how the model uses each feature—including sensitive demographic attributes—to arrive at a prediction. By analyzing these attributions across demographic groups, you can detect whether the model disproportionately relies on the demographic feature, indicating potential bias. This approach is more granular than aggregate metrics and directly addresses the business team's concern about bias in individual predictions.

Exam trap

Google Cloud often tests the distinction between bias detection (analysis) and bias mitigation (retraining), so candidates may incorrectly choose Option C as a quick fix instead of the correct analytical approach using Explainable AI.

How to eliminate wrong answers

Option B is wrong because computing overall fairness metrics (e.g., demographic parity) only compares aggregate prediction rates across groups, which can mask per-instance bias and does not reveal whether the model is using the demographic feature in a discriminatory way. Option C is wrong because collecting more data and retraining the model is a remediation step, not an analysis step; it does not help diagnose whether the current model exhibits bias. Option D is wrong because Vertex AI Model Monitoring checks for training-serving skew (distribution drift between training and serving data), not for bias or fairness in predictions against demographic groups.

47
MCQhard

A company has a model that requires GPU for inference and has strict latency requirements. They deployed on Vertex AI Endpoint with autoscaling but observe cold start latency when scaling up. What is the best solution?

A.Set a higher min_replica_count to keep instances warm
B.Pre-compile the model with TensorRT
C.Use a larger GPU instance
D.Switch to batch prediction
AnswerA

Keeping a minimum number of instances online avoids cold starts when traffic spikes.

Why this answer

Option A is correct: setting a higher min_replica_count ensures there are always some warm instances ready to serve traffic, reducing cold start latency. Option B is wrong because a larger GPU does not address the cold start issue. Option C is wrong because batch prediction is not suitable for online serving.

Option D is wrong because pre-compiling with TensorRT can improve inference speed but does not eliminate cold start delays from scaling.

48
Drag & Dropmedium

Drag and drop the steps to deploy a Cloud Dataflow pipeline from a template into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Templates simplify deployment of common Dataflow patterns without writing code.

49
MCQhard

You are responsible for deploying a PyTorch model for real-time inference. The model requires GPU acceleration. You want to minimize infrastructure management overhead. Which serving option should you choose?

A.Deploy the model as a Cloud Function with a GPU backend
B.Use Cloud Run with GPU enabled
C.Use AI Platform Training to host the model as a prediction service
D.Deploy the model on Vertex AI Endpoints using a custom container with GPU support
AnswerD

Vertex AI supports custom containers and GPUs for serving.

Why this answer

Vertex AI Endpoints with a custom container and GPU support is the correct choice because it is purpose-built for serving ML models at scale, fully managed, and supports GPU acceleration for low-latency inference. It minimizes infrastructure overhead by handling auto-scaling, health checks, and model versioning, unlike the other options that lack GPU support or are designed for training rather than serving.

Exam trap

Google Cloud often tests the misconception that Cloud Run or Cloud Functions can support GPUs, but in reality, neither service offers GPU acceleration, making Vertex AI Endpoints the only viable managed option for GPU inference.

How to eliminate wrong answers

Option A is wrong because Cloud Functions do not support GPU backends; they are serverless compute for lightweight, event-driven code and cannot accelerate PyTorch inference. Option B is wrong because Cloud Run does not currently support GPUs; it is a managed compute platform for containerized applications but lacks GPU attachment capabilities. Option C is wrong because AI Platform Training is designed for model training jobs, not for hosting a real-time prediction service; it lacks the endpoint management, autoscaling, and low-latency serving features required for production inference.

50
MCQhard

Your company runs a real-time recommendation system for a popular e-commerce website using a machine learning model deployed on Vertex AI Endpoints. The model takes user features and product catalog data as input and returns top-10 product recommendations. The system uses a feature store to serve user embeddings and product embeddings. Recently, the recommender team retrained the model with a new algorithm and deployed it as a new version. Since the deployment, the latency for recommendation requests has increased from 100ms to 500ms on average, exceeding the 200ms SLO. The model accuracy is acceptable, and there are no errors. The endpoint uses an n1-standard-8 machine with a single GPU. The new model is larger but still fits on the GPU. You investigate and find that the GPU utilization remains low (<20%), but CPU utilization is high (90%). What should you do to reduce latency while maintaining accuracy?

A.Upgrade the machine type to one with more GPU memory (e.g., n1-standard-8 with a larger GPU) to reduce model inference time.
B.Change the batch size in the model serving code to process multiple requests together, improving GPU utilization.
C.Increase the number of replicas (nodes) to parallelize the CPU-bound preprocessing work.
D.Offload preprocessing to a dedicated Cloud Run service that runs asynchronously and returns precomputed feature vectors.
AnswerC

Adding more nodes will distribute the preprocessing load across multiple CPUs, reducing the overall latency per request if the load balancer dispatches requests efficiently. However, this increases cost.

Why this answer

Option C is correct because the high CPU utilization (90%) with low GPU utilization (<20%) indicates that the bottleneck is CPU-bound preprocessing, not GPU inference. Increasing the number of replicas (nodes) distributes the CPU preprocessing load across multiple instances, reducing per-request latency without affecting model accuracy. This directly addresses the root cause while keeping the existing GPU resources.

Exam trap

Google Cloud often tests the misconception that GPU utilization must be increased to reduce latency, but the trap here is that the bottleneck is CPU-bound preprocessing, not GPU inference, so scaling replicas (horizontal scaling) is the correct fix, not GPU upgrades or batching.

How to eliminate wrong answers

Option A is wrong because upgrading to a larger GPU does not address the CPU bottleneck; GPU memory is sufficient and GPU utilization is low, so more GPU memory would not reduce latency. Option B is wrong because increasing batch size would increase latency per request (as requests wait to be batched) and does not solve CPU-bound preprocessing; it may even worsen CPU contention. Option D is wrong because offloading preprocessing to a Cloud Run service asynchronously would add network round-trip latency and complexity, and the preprocessing is likely synchronous and required per request; it would not reduce the CPU bottleneck on the serving path.

51
Multi-Selecthard

Which TWO strategies help reduce prediction latency for a real-time model deployed on Vertex AI Endpoint?

Select 2 answers
A.Use batch prediction instead of online
B.Use Cloud CDN to cache predictions
C.Use a larger machine type (e.g., n1-highcpu-16)
D.Reduce model complexity (e.g., quantize or prune)
E.Enable autoscaling with a minimum replica count
AnswersD, E

Reduces inference time.

Why this answer

Enabling autoscaling with min replicas reduces cold start latency. Reducing model complexity (e.g., quantization) speeds up inference. Larger machine types may increase latency if not needed.

Batch prediction is offline. Cloud CDN is for static content.

52
MCQhard

You are a data engineer at a financial services company that uses Vertex AI to train and deploy models for credit risk assessment. The company has strict governance requirements: every model version must be approved by the risk committee before going to production. The approval process can take several days. Currently, the team trains a new model weekly and manually deploys it to a staging endpoint for review, then manually promotes to production after approval. This process is error-prone and slow. You want to automate the pipeline: training should trigger automatically when new data arrives, the model should be automatically deployed to a staging endpoint for review, and after manual approval, it should be promoted to production. Additionally, you need to ensure that if a model in staging performs poorly (e.g., low accuracy), it should not be promoted even if approved. What should you do?

A.Use Vertex AI Experiments to track model versions, then manually deploy from the Experiments UI.
B.Use Cloud Scheduler to run training weekly, then use Cloud Functions to deploy to staging, and after manual approval, use another Cloud Function to check performance and deploy to production.
C.Create a Vertex AI Pipeline that: (1) Triggers on new data, (2) Trains model, (3) Evaluates and stores metrics in the model registry, (4) Deploys to staging endpoint as a new model version. Then use a manual approval step (e.g., via Cloud Build approval or external system) to trigger a second pipeline that checks the stored metrics and, if acceptable, deploys to production endpoint.
D.Train models on Vertex AI Workbench and use a CI/CD tool like Cloud Build to deploy to staging. Use a Cloud Build approval step to promote to production after manual check.
AnswerC

This automates training and staging deployment, then separates approval gate, and uses metric check to conditionally promote to production.

Why this answer

The best approach uses Vertex AI Pipelines to automatically train and deploy to a staging endpoint. After manual approval, a separate pipeline step checks model performance metrics (which were stored during training/evaluation) and if they meet a threshold, promotes to production. This enforces governance and automation.

53
MCQeasy

A data scientist has iterated on a model and produced a new version. The organization requires the ability to roll back to the previous version quickly if the new version performs poorly in production. Which approach should be used?

A.Store each model version in a separate Cloud Storage bucket.
B.Keep the previous model in a container image and redeploy via Cloud Run.
C.Use Cloud Source Repositories to tag model versions.
D.Upload both versions to Vertex AI Model Registry and use endpoint traffic splitting to route 100% to the safe version if needed.
AnswerD

The registry keeps versions; endpoint traffic allows instant switch.

Why this answer

Vertex AI Model Registry allows you to deploy multiple model versions and use endpoint traffic splitting to gradually shift traffic or instantly route 100% to a specific version. This enables immediate rollback by setting the traffic split to 100% for the previous model version without redeploying or changing infrastructure.

Exam trap

Google Cloud often tests the misconception that version control tools (like Cloud Source Repositories) or storage buckets are sufficient for rollback, when in fact the key requirement is a managed model registry with traffic splitting capabilities for instant, no-downtime rollback.

How to eliminate wrong answers

Option A is wrong because storing each model version in a separate Cloud Storage bucket does not provide a mechanism for quick rollback; you would still need to redeploy the model from that bucket, which is not instantaneous. Option B is wrong because keeping the previous model in a container image and redeploying via Cloud Run is not a rollback strategy—it requires a new deployment, which takes time and does not leverage Vertex AI's managed traffic splitting. Option C is wrong because Cloud Source Repositories is a source code version control service, not a model registry; tagging model versions there does not affect production endpoint traffic.

54
MCQhard

You are a machine learning engineer at a FinTech company. Your team has developed a credit risk model using XGBoost and deployed it on Vertex AI Prediction using a custom container. The model is used for real-time credit decisions, and the endpoint is configured with a single machine type (n1-standard-4) and min_replica_count = 2, max_replica_count = 10. Recently, the team observed that during a promotional campaign, the endpoint's prediction latency increased from 200ms to over 2 seconds, and some requests resulted in 503 errors. You check the Cloud Monitoring metrics and see that CPU utilization reached 100% on the existing replicas, but the number of replicas never scaled beyond the initial 2. The deployment uses a custom container that runs a TensorFlow Serving-like model server. The container image is stored in Artifact Registry. The Vertex AI endpoint is configured with a traffic split of 100% to this model version. What is the most likely cause of the scaling failure, and what step should you take to resolve it?

A.Increase min_replica_count to 5 to handle the baseline load.
B.Change the endpoint configuration to use gRPC instead of HTTP to reduce latency.
C.Ensure the custom container exposes the correct metrics for CPU utilization so that Vertex AI autoscaling can trigger.
D.Set the max_replica_count to a higher value like 20.
AnswerC

Autoscaling relies on metrics; if the container doesn't expose them, scaling won't happen.

Why this answer

Option C is correct because Vertex AI's autoscaling relies on the custom container exposing standard metrics (e.g., CPU utilization via the /metrics endpoint in a Prometheus format or through the Vertex AI custom metric adapter). If the container does not expose these metrics, the autoscaler cannot detect high CPU usage and will not trigger scaling beyond the initial replicas, leading to latency spikes and 503 errors under load.

Exam trap

The trap here is that candidates assume autoscaling is automatic based on CPU utilization alone, but Vertex AI requires explicit metric exposure from custom containers; otherwise, the autoscaler remains inactive.

How to eliminate wrong answers

Option A is wrong because increasing min_replica_count only sets a baseline number of replicas; it does not fix the autoscaling mechanism that failed to add replicas when CPU hit 100%. Option B is wrong because switching to gRPC can reduce network overhead and latency, but it does not address the root cause of scaling failure—the autoscaler not triggering due to missing metrics. Option D is wrong because raising max_replica_count only increases the upper limit; if the autoscaler never triggers scaling (due to missing metrics), the replicas will remain at the initial count regardless of the max setting.

55
MCQmedium

After deploying a model to Vertex AI Endpoints, the prediction responses include unexpected data. The model returns logits instead of probabilities. What is the most likely cause?

A.The model was trained with different loss
B.The input data is scaled incorrectly
C.The endpoint is not properly configured
D.The model output is not post-processed
AnswerD

Missing softmax or similar transformation leads to raw logits being returned.

Why this answer

The most likely cause is that the model output is not post-processed. In Vertex AI Endpoints, models often output raw logits (unnormalized scores) from the final layer, and a softmax or sigmoid activation must be applied as a post-processing step to convert these logits into probabilities. Without this post-processing, the endpoint returns the raw logits, which is why the prediction responses contain unexpected data.

Exam trap

Google Cloud often tests the distinction between model training configurations and serving/post-processing steps, and the trap here is that candidates assume the endpoint or deployment configuration controls output formatting, when in fact the model's exported graph or serving function determines whether logits or probabilities are returned.

How to eliminate wrong answers

Option A is wrong because training with a different loss function (e.g., cross-entropy vs. mean squared error) does not directly cause the model to output logits instead of probabilities; the output layer's activation function (or lack thereof) determines whether outputs are logits or probabilities. Option B is wrong because incorrect input scaling would affect the prediction values (e.g., shifting or scaling them), but it would not change the fundamental nature of the output from logits to probabilities; the model would still output whatever its final layer produces. Option C is wrong because the endpoint configuration (e.g., machine type, traffic splitting, or model version) does not alter the model's output format; the endpoint simply serves the model's raw predictions as-is.

56
MCQhard

A data science team deploys a TensorFlow image classification model to Vertex AI Prediction. The model performs well in offline evaluation but shows a 15% drop in accuracy in production. The production data distribution has shifted compared to the training data. The team needs to continuously monitor and retrain the model. Which solution is most appropriate for detecting drift and triggering retraining?

A.Enable Vertex AI Model Monitoring for feature drift; configure alerts to trigger a Vertex AI Pipelines retraining run.
B.Export production predictions to Cloud Logging, then use Log Analytics to compare distributions.
C.Store predictions in BigQuery and run scheduled SQL queries to detect drift; trigger retraining via Cloud Functions.
D.Use Cloud Monitoring to track prediction latency and error rates; manually retrain when errors increase.
AnswerA

Vertex AI Model Monitoring detects drift and can trigger automated retraining.

Why this answer

Vertex AI Model Monitoring is purpose-built for detecting feature drift in production ML models by comparing live inference data against a baseline distribution. When drift is detected, it can directly trigger a Vertex AI Pipelines retraining run, creating an automated, end-to-end MLOps loop that addresses the production accuracy drop without manual intervention.

Exam trap

Google Cloud often tests the distinction between operational monitoring (latency, errors) and data-quality monitoring (feature drift), leading candidates to mistakenly choose Cloud Monitoring (Option D) because they confuse production health metrics with model-specific distribution shifts.

How to eliminate wrong answers

Option B is wrong because exporting predictions to Cloud Logging and using Log Analytics for distribution comparison is a manual, ad-hoc approach that lacks native drift detection algorithms and automated retraining triggers, making it unsuitable for continuous monitoring. Option C is wrong because storing predictions in BigQuery and running scheduled SQL queries to detect drift requires custom statistical logic and does not leverage Vertex AI's built-in drift detection, alerting, or pipeline integration, leading to higher maintenance overhead. Option D is wrong because Cloud Monitoring tracks prediction latency and error rates, which are operational metrics, not feature distribution shifts; relying on error rates as a proxy for drift is indirect and unreliable, and manual retraining defeats the goal of continuous automation.

57
Drag & Dropmedium

Drag and drop the steps to create a Cloud Bigtable instance and table using the CLI into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Bigtable instances contain clusters; tables are created within instances and must have column families.

58
MCQeasy

Your team wants to continuously monitor a deployed model's performance in production. They need to detect when the model's predictions become unreliable due to changes in the real world (e.g., new customer behavior). Which Vertex AI service should they use?

A.Vertex AI Explainable AI
B.Vertex AI Experiments
C.Vertex AI Model Monitoring
D.Vertex AI Prediction
AnswerC

Model Monitoring continuously checks for skew, drift, and performance issues.

Why this answer

Vertex AI Model Monitoring is the correct choice because it is specifically designed to continuously track a deployed model's prediction quality over time, detecting issues like data drift, feature drift, and prediction skew that indicate the model's reliability is degrading due to changes in the real world. It automatically compares incoming prediction data against a baseline training dataset and alerts when statistical distributions shift beyond configurable thresholds, enabling proactive retraining or intervention.

Exam trap

Google Cloud often tests the distinction between services that 'serve' predictions (Vertex AI Prediction) versus those that 'monitor' predictions (Vertex AI Model Monitoring), leading candidates to mistakenly choose the prediction service when the question asks about detecting unreliability.

How to eliminate wrong answers

Option A is wrong because Vertex AI Explainable AI provides feature attributions and explanations for individual predictions, but it does not continuously monitor model performance or detect drift in production. Option B is wrong because Vertex AI Experiments is used for tracking and comparing machine learning experiments during model development, not for monitoring deployed models in production. Option D is wrong because Vertex AI Prediction is the service that hosts and serves the model for online predictions, but it has no built-in monitoring capabilities for detecting performance degradation or data drift.

59
MCQhard

In the Vertex AI Pipeline component YAML exhibit, the component is designed to evaluate a model and produce metrics. If the threshold_accuracy is set to 0.85, what is the expected behavior of this component?

A.It will output the evaluation metrics, and the pipeline can use them for conditional decisions
B.It will deploy the model if the accuracy meets the threshold
C.It will ignore the threshold_accuracy input if not provided
D.It will fail if the model accuracy is below 0.85
AnswerA

The component outputs metrics for downstream use.

Why this answer

In Vertex AI Pipelines, a component's YAML definition specifies inputs, outputs, and implementation. Setting `threshold_accuracy` to 0.85 defines a parameter that the component can use internally, but by itself it does not trigger deployment or cause failure. The component's expected behavior is to output evaluation metrics, and the pipeline can then use those metrics in conditional logic (e.g., via `Condition` or `if/else` tasks) to decide subsequent steps, such as model deployment or retraining.

Exam trap

Google Cloud often tests the misconception that setting a threshold in a component's YAML automatically enforces that threshold (e.g., causing failure or deployment), when in reality the YAML only defines the interface and the component's code must explicitly implement such logic.

How to eliminate wrong answers

Option B is wrong because Vertex AI Pipeline components do not inherently deploy models; deployment is a separate step typically handled by a deployment component or a pipeline condition that triggers a deployment task. Option C is wrong because if `threshold_accuracy` is not provided, the component will either use a default value defined in the YAML or fail validation, depending on whether the input is required; it does not simply ignore it. Option D is wrong because the component does not fail when accuracy is below the threshold; it merely outputs the metrics, and the pipeline logic (e.g., a conditional branch) must be explicitly configured to handle such cases.

60
MCQmedium

Refer to the exhibit. A developer sees this log entry when trying to get a prediction. What is the most likely cause?

A.The model ID is incorrect
B.The model version is not deployed
C.The endpoint does not exist
D.The project ID is wrong
AnswerB

A model version must be deployed to an endpoint to serve predictions; 'not found' suggests it is not deployed.

Why this answer

The log entry indicates that the model version specified in the request is not currently deployed to the serving infrastructure. In Google Cloud's Vertex AI, a model version must be explicitly deployed to an endpoint before it can serve predictions; attempting to predict against a non-deployed version returns an error. This is the most likely cause because the error message directly references the model version's deployment status.

Exam trap

Google Cloud often tests the distinction between model registry operations (uploading, versioning) and serving operations (deploying, predicting), trapping candidates who assume any model version in the registry is automatically available for predictions.

How to eliminate wrong answers

Option A is wrong because an incorrect model ID would typically result in a 'Model not found' or 'Invalid model' error, not a deployment status error. Option C is wrong because a non-existent endpoint would produce a 'Endpoint not found' or 'Resource not found' error, not a version deployment issue. Option D is wrong because a wrong project ID would cause an authentication or permission error (e.g., 'Project not found' or 'Permission denied'), not a model version deployment error.

61
MCQhard

A team uses Vertex AI Feature Store for real-time features. They notice that features are frequently missing during prediction serving. What is the best practice to handle missing features?

A.Retrain the model to handle missing values
B.Impute missing values in the serving function
C.Use a default value in the feature store definition
D.Drop the prediction request
AnswerC

Feature store allows defining default values for missing features.

Why this answer

Option C is correct because Vertex AI Feature Store allows you to define a default value for each feature at the time of feature store creation or feature definition. When a feature value is missing during serving, the feature store automatically returns this default value instead of failing or returning null. This ensures that the serving function always receives a valid feature value without requiring custom imputation logic or model retraining.

Exam trap

Google Cloud often tests the misconception that missing values should be handled by the model or serving code, but the correct approach is to leverage the feature store's built-in default value capability to ensure consistency and low latency.

How to eliminate wrong answers

Option A is wrong because retraining the model to handle missing values does not address the root cause of missing features during serving; it only adapts the model to potentially missing inputs, but the feature store should guarantee a value is present. Option B is wrong because imputing missing values in the serving function introduces latency and custom logic that should be handled at the feature store level; Vertex AI Feature Store provides built-in default value support to avoid this. Option D is wrong because dropping the prediction request is a drastic measure that leads to poor user experience and loss of business value; the feature store should gracefully handle missing features with defaults.

62
MCQmedium

A machine learning team wants to deploy a new model version for canary testing, where only 5% of traffic is routed to the new version. Which Vertex AI endpoint configuration supports this?

A.Have the client application randomly select which model to call with 5% probability.
B.Deploy the new version to a separate endpoint and direct 5% of users via a load balancer.
C.Configure the endpoint with traffic split: 95% to old version, 5% to new version.
D.Use an A/B testing framework outside of Vertex AI to compare results.
AnswerC

Vertex AI endpoints allow splitting traffic between deployed models; the platform handles routing.

Why this answer

Vertex AI supports traffic splitting between model versions in endpoints. Option B is correct. Option A is wrong because it deploys a separate endpoint.

Option C is wrong because it suggests direct traffic control by client. Option D is wrong because A/B testing is a process, not configuration.

63
Multi-Selecteasy

A data engineer is setting up CI/CD for a machine learning model using Cloud Build and Vertex AI. Which two components are essential? (Select 2)

Select 2 answers
A.Cloud Storage for datasets
B.Container Registry for model images
C.Cloud Source Repositories
D.Vertex AI Endpoints for deployment
E.Cloud Functions for triggers
AnswersB, D

Model images must be stored and versioned in a registry like Container Registry to deploy to Vertex AI.

Why this answer

Container Registry stores model images, and Vertex AI Endpoints hosts the deployed model. Both are essential in a CI/CD pipeline for ML.

64
Multi-Selectmedium

A company deploys a TensorFlow model on Vertex AI for online predictions. They want to monitor model performance in production to detect degradation. Which TWO practices should they implement? (Choose 2.)

Select 2 answers
A.Use a separate endpoint for shadow testing new model versions.
B.Log prediction requests and responses to Cloud Logging and analyze distribution metrics.
C.Set up Cloud Monitoring alerts for high prediction latency.
D.Schedule daily retraining of the model regardless of monitoring alerts.
E.Enable Vertex AI Model Monitoring for feature drift and skew detection on the deployed model.
AnswersB, E

Analyzing request distributions can detect changes in input data patterns that may affect model performance.

Why this answer

Option B is correct because logging prediction requests and responses to Cloud Logging allows you to analyze distribution metrics (e.g., mean, variance, quantiles) over time. This enables detection of data drift or performance degradation by comparing live distributions against baseline distributions, which is a standard monitoring practice for production ML models.

Exam trap

Google Cloud often tests the distinction between monitoring for model degradation (data drift/skew) versus monitoring for operational issues (latency, errors), leading candidates to confuse infrastructure alerts with model performance monitoring.

65
MCQeasy

You are responsible for monitoring a production ML model on Vertex AI. The model predicts loan approval probability. The business team reports that the model's predictions are becoming less accurate over the last week. You check the model's monitoring dashboard and see that the prediction distribution has changed significantly. What is the most likely issue?

A.The model is suffering from overfitting to the training data.
B.There is a bug in the model's preprocessing code.
C.There is data drift in the input features.
D.The model is experiencing concept drift.
AnswerD

Concept drift means the underlying relationship between features and target has changed, causing prediction distribution to shift and accuracy to drop.

Why this answer

The correct answer is D because concept drift occurs when the underlying relationship between input features and the target variable changes over time, causing the model's predictions to become less accurate even if the input data distribution remains stable. In this scenario, the prediction distribution has changed significantly, which is a hallmark of concept drift, as the model's learned decision boundary no longer reflects the current real-world patterns. Vertex AI's monitoring dashboard can track prediction distribution shifts, and this symptom points to concept drift rather than data drift.

Exam trap

Google Cloud often tests the distinction between data drift and concept drift, and the trap here is that candidates see 'prediction distribution has changed' and incorrectly assume it must be data drift, when in fact a change in prediction distribution without a change in input features is a classic sign of concept drift.

How to eliminate wrong answers

Option A is wrong because overfitting to the training data is a static issue that would manifest as poor generalization from the start, not as a sudden degradation in accuracy over the last week; overfitting does not cause a change in prediction distribution over time. Option B is wrong because a bug in the model's preprocessing code would likely cause consistent, systematic errors or failures, not a gradual shift in prediction distribution over a week; preprocessing bugs are typically static and would be caught during deployment. Option C is wrong because data drift refers to changes in the input feature distribution, which would be detected by monitoring input feature statistics, not directly by a change in prediction distribution; the question states the prediction distribution has changed, which is more directly tied to concept drift.

66
MCQmedium

After deploying a model, the team notices that predictions are significantly different from training data distribution. What should they do?

A.Update the model endpoint
B.Review the training data pipeline
C.Set up Vertex AI Model Monitoring for skew detection
D.Retrain the model with new data
AnswerC

Model Monitoring provides continuous tracking of distribution differences.

Why this answer

Vertex AI Model Monitoring is specifically designed to detect skew between training data and serving data, including prediction drift. When predictions differ significantly from the training distribution, this indicates a skew or drift issue that Model Monitoring can alert on, enabling proactive investigation. Updating the endpoint or retraining without diagnosis would not address the root cause, and reviewing the pipeline alone does not provide ongoing detection.

Exam trap

Google Cloud often tests the distinction between reactive troubleshooting (reviewing pipelines, retraining) and proactive monitoring (skew detection), tempting candidates to choose a fix like retraining instead of the monitoring solution that detects the issue first.

How to eliminate wrong answers

Option A is wrong because updating the model endpoint does not diagnose or resolve the distribution mismatch; it only changes the serving target without addressing the underlying data or model behavior. Option B is wrong because reviewing the training data pipeline is a reactive, one-time investigation step, whereas the question describes a deployed model scenario where continuous monitoring is needed to detect and alert on skew in real time. Option D is wrong because retraining with new data without first understanding the cause of the skew may introduce new biases or fail to fix the issue; monitoring should be used to detect and diagnose before retraining.

67
MCQhard

A financial services company needs to explain predictions from a complex ensemble model for regulatory compliance. Which Vertex AI service should they use?

A.Vertex AI Explainable AI
B.Vertex AI Vizier
C.Vertex AI Feature Store
D.Vertex AI Prediction
AnswerA

Provides explanations via feature attributions.

Why this answer

Vertex AI Explainable AI is the correct service because it provides feature attributions and other explainability techniques (e.g., Shapley value approximations, integrated gradients) that help interpret predictions from complex ensemble models. This is essential for regulatory compliance, where the company must demonstrate how input features influence each prediction, ensuring transparency and auditability.

Exam trap

Google Cloud often tests the distinction between services that optimize or deploy models versus those that interpret them, so the trap here is assuming that Vertex AI Prediction includes built-in explainability, when in fact it only serves predictions and requires a separate Explainable AI request for attributions.

How to eliminate wrong answers

Option B (Vertex AI Vizier) is wrong because it is a hyperparameter tuning and optimization service, not designed for explaining model predictions. Option C (Vertex AI Feature Store) is wrong because it serves as a centralized repository for feature management and serving, not for generating post-hoc explanations of model outputs. Option D (Vertex AI Prediction) is wrong because it handles model deployment and online/batch inference requests, but does not natively provide interpretability or attribution explanations for individual predictions.

68
MCQhard

Refer to the exhibit. A data scientist notices that the evaluation component rarely passes the threshold, causing the pipeline to fail often. What should they do to improve efficiency?

A.Reduce the training dataset size
B.Add a conditional component that only runs evaluation if training metrics are above a certain level
C.Remove the evaluation component
D.Increase the threshold value
AnswerB

Conditional execution saves cost and time by skipping evaluation on underperforming models.

Why this answer

Adding a conditional component that only runs evaluation when training metrics exceed a certain threshold prevents unnecessary evaluation runs on poorly performing models. This reduces pipeline failures by ensuring that evaluation, which may be resource-intensive or prone to failure with low-quality inputs, is only triggered when the model has demonstrated sufficient training performance. This approach optimizes resource usage and pipeline reliability without sacrificing the evaluation step entirely.

Exam trap

Google Cloud often tests the misconception that simply adjusting thresholds or removing components is the solution, when the correct approach is to add conditional logic to gate resource-intensive steps based on upstream quality metrics.

How to eliminate wrong answers

Option A is wrong because reducing the training dataset size would likely degrade model quality and does not address the root cause of evaluation failures; it may even increase variance and instability. Option C is wrong because removing the evaluation component entirely would eliminate the ability to validate model performance, which is critical for ensuring model quality and compliance in production pipelines. Option D is wrong because increasing the threshold value would make it even harder for the evaluation component to pass, exacerbating the failure rate rather than improving efficiency.

69
MCQmedium

A company uses BigQuery ML to create a classification model. The model is used for batch prediction on a weekly basis. After six months, the data distribution shifts, and model accuracy drops. Which approach should the company take to maintain model performance?

A.Use Cloud Dataflow to preprocess the data and then update the model with new features.
B.Perform hyperparameter tuning on the original training data.
C.Apply model quantization to reduce model size and improve inference speed.
D.Schedule automatic retraining of the model using the most recent three months of data.
AnswerD

Retraining on recent data adapts to distribution shift.

Why this answer

Option D is correct because the model's accuracy drop is due to data distribution shift (concept drift). Scheduling automatic retraining using the most recent three months of data ensures the model adapts to the new patterns without manual intervention. BigQuery ML supports scheduled queries and automatic model retraining via the `CREATE OR REPLACE MODEL` statement, making this approach both practical and aligned with MLOps best practices for batch prediction pipelines.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning or feature engineering alone can fix data drift, when in fact only retraining on fresh data addresses the shift.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a data processing tool, not a solution for retraining; preprocessing and adding new features does not address the distribution shift unless the model is retrained on the new data. Option B is wrong because hyperparameter tuning on the original training data optimizes the model for the old distribution, not the shifted one, and will not recover accuracy. Option C is wrong because model quantization reduces model size and speeds up inference but does not improve accuracy or address data drift; it may even slightly degrade performance.

70
Multi-Selectmedium

Which TWO actions should you take to ensure model reliability in a production Vertex AI Endpoint?

Select 2 answers
A.Use only batch predictions to avoid real-time issues
B.Monitor prediction accuracy in production with logging and alerts
C.Disable request/response logging to reduce latency
D.Use a single model endpoint for all traffic
E.Gradually shift traffic to new model versions (canary deployment)
AnswersB, E

Detects model degradation.

Why this answer

Monitoring prediction accuracy with logging and alerts (B) is essential for detecting model drift, data drift, and performance degradation in production. Vertex AI provides model monitoring features that automatically log prediction requests and responses, compute statistics, and trigger alerts when skew or drift thresholds are breached, enabling proactive remediation.

Exam trap

Google Cloud often tests the misconception that disabling logging improves reliability by reducing latency, when in fact it removes the observability needed to detect and diagnose failures, which is a core tenet of MLOps reliability.

71
MCQeasy

A data science team needs to ensure that a deployed Vertex AI model can handle varying traffic patterns with minimal latency and cost. What should they do?

A.Use Vertex AI Prediction with autoscaling
B.Use batch prediction instead of online
C.Pre-warm all instances
D.Deploy to a single large machine type
AnswerA

Autoscaling adjusts replicas based on traffic, balancing latency and cost.

Why this answer

Vertex AI Prediction with autoscaling dynamically adjusts the number of serving instances based on incoming traffic, ensuring minimal latency during spikes and cost efficiency during lulls. This is the recommended approach for handling variable traffic patterns in production, as it leverages Google Cloud's managed infrastructure to scale from zero to thousands of nodes automatically.

Exam trap

Google Cloud often tests the misconception that batch prediction can substitute for online serving in variable traffic scenarios, but the key distinction is that batch prediction lacks real-time latency guarantees and cannot scale dynamically per request.

How to eliminate wrong answers

Option B is wrong because batch prediction is designed for asynchronous, large-scale offline inference on static datasets, not for real-time traffic with varying patterns; it cannot handle low-latency online requests. Option C is wrong because pre-warming all instances defeats the purpose of autoscaling, leading to constant high cost regardless of actual traffic, and is not a dynamic solution. Option D is wrong because deploying to a single large machine type creates a single point of failure and cannot scale horizontally to handle traffic spikes, resulting in either over-provisioning cost or latency under load.

72
MCQhard

A financial services company must ensure that predictions from a deployed model do not become biased against protected groups. They have a monitoring system in place. Which metric should they track?

A.Prediction latency
B.Prediction distribution across demographic segments
C.Per-query input feature distribution
D.Model accuracy over time
AnswerB

Comparing prediction distributions across groups reveals potential bias in outcomes.

Why this answer

Tracking prediction distribution across demographic segments (option B) directly monitors for bias by comparing the model's output rates for different protected groups. If the distribution diverges significantly, it indicates potential disparate impact, which is the core concern for fairness in deployed models. This aligns with monitoring for algorithmic fairness, not just operational performance.

Exam trap

The trap here is that candidates confuse operational metrics (latency, accuracy) with fairness metrics, assuming high accuracy guarantees fairness, but Cisco tests that bias can exist even with high accuracy if the model performs differently across demographic segments.

How to eliminate wrong answers

Option A is wrong because prediction latency measures the time taken to serve a prediction, which is a performance metric unrelated to bias or fairness against protected groups. Option C is wrong because per-query input feature distribution tracks individual input values, not the aggregated output predictions across demographic segments needed to detect bias. Option D is wrong because model accuracy over time measures overall predictive performance, which can remain high even when the model is biased against a specific group (e.g., accuracy may be high for a majority class while failing for a minority class).

73
MCQmedium

A retail company uses Vertex AI Pipelines to automate monthly retraining of a recommendation model. The pipeline consists of three steps: (1) extract data from BigQuery, (2) train a TensorFlow model on Vertex AI Training, (3) upload the model to Vertex AI Model Registry and deploy to an endpoint if performance metrics improve. Recently, the pipeline has been failing at step 2 with the error: 'The job was cancelled by the system because it exceeded the maximum training time of 3600 seconds.' You have confirmed that the training code is correct and the data size has not changed significantly. What should you do to fix this pipeline failure? A) Reconfigure the pipeline to use a larger machine type for training. B) Set the training timeout to 7200 seconds in the pipeline configuration. C) Reduce the training dataset size by sampling fewer rows. D) Switch from TensorFlow to a simpler model framework.

A.Reduce the training dataset size by sampling fewer rows.
B.Set the training timeout to 7200 seconds in the pipeline configuration.
C.Switch from TensorFlow to a simpler model framework.
D.Reconfigure the pipeline to use a larger machine type for training.
AnswerB

Increasing the timeout accommodates the training duration within the expected limits.

Why this answer

Option B is correct because the default timeout for a training job in Vertex AI Pipelines is 3600 seconds; increasing the timeout allows the job to complete. Option A (larger machine) may help but is not a direct fix for timeout. Option C (reducing data) degrades model quality.

Option D (changing framework) is drastic and unnecessary.

74
MCQhard

A company uses Vertex AI Feature Store for serving features to both training and prediction. The team notices that predictions made shortly after training use different feature values, causing a training-serving skew. What is the most effective way to prevent this skew?

A.Configure the Feature Store to use point-in-time lookup using the training timestamp
B.Retrain the model more frequently to adapt to the new feature distributions
C.Use batch prediction instead of online prediction to ensure consistent features
D.Ensure that the training and prediction environments use identical compute resources
AnswerA

Point-in-time lookup ensures that the same feature values used during training are used during serving.

Why this answer

Option C is correct because using a feature timestamp to serve the exact feature point-in-time that was used during training ensures consistency. Option A (retraining more often) does not prevent skew. Option B (batch prediction) still uses current features.

Option D (identical compute resources) does not affect feature values.

75
MCQhard

A company runs large batch prediction jobs on Vertex AI every day. They want to minimize costs while ensuring the jobs complete within a 4-hour window. The model requires significant memory. What is the most cost-effective approach?

A.Use Cloud TPUs to accelerate predictions
B.Use a smaller machine type (e.g., n1-standard-4) to reduce cost
C.Use preemptible VMs with a machine type that meets memory requirements
D.Use standard VMs and reduce parallelization
AnswerC

Preemptible VMs are much cheaper and restartable, suitable for batch jobs.

Why this answer

Preemptible VMs (now called Spot VMs) are significantly cheaper than standard VMs (up to 60-80% discount) and are ideal for fault-tolerant batch prediction jobs that can handle interruptions. Since the job has a 4-hour window and the model requires significant memory, using preemptible VMs with a machine type that meets the memory requirements minimizes cost while allowing the job to complete if restarted within the time limit.

Exam trap

Google Cloud often tests the misconception that preemptible VMs are unreliable for any production workload, but the trap here is that batch prediction jobs are inherently fault-tolerant and can leverage preemptible VMs to drastically reduce costs without violating the completion window.

How to eliminate wrong answers

Option A is wrong because Cloud TPUs are specialized hardware for training and inference of large models, but they are more expensive and not necessary for batch prediction; they also do not directly address the memory requirement or cost minimization for a 4-hour window. Option B is wrong because using a smaller machine type (e.g., n1-standard-4) would likely cause out-of-memory errors or severe performance degradation since the model requires significant memory, making the job fail or exceed the 4-hour window. Option D is wrong because reducing parallelization would increase job duration, potentially exceeding the 4-hour window, and standard VMs are more expensive than preemptible VMs, so this approach does not minimize costs.

Page 1 of 3 · 191 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Operationalizing machine learning models questions.