Google Professional Machine Learning Engineer PMLE Questions 451–506 | Page 7/7

451

Multi-Selecteasy

Which TWO practices help ensure reproducible ML experiments?

Select 2 answers

A.Store all artifacts in a temporary bucket

B.Use a random seed for each run

C.Use Vertex AI Experiments to track parameters and metrics

D.Version control training code with Cloud Source Repositories

E.Use preemptible VMs

AnswersC, D

Experiments record the exact configuration and results.

Why this answer

Vertex AI Experiments automatically logs parameters, metrics, and artifacts for each run, creating a complete lineage that enables exact reproduction of results. By tracking these details alongside the code version, you can recreate the exact environment and configuration that produced a given outcome, which is essential for reproducibility.

Exam trap

Google Cloud often tests the distinction between practices that improve reproducibility (like tracking parameters and versioning code) versus practices that improve cost efficiency or speed (like using preemptible VMs or temporary storage), leading candidates to conflate operational convenience with scientific reproducibility.

Full explanation →

452

MCQhard

A team uses custom training and deploys a TensorFlow model using Vertex AI Endpoints. They set up Cloud Monitoring alerts for online prediction latency. However, they notice the latency metric shows a spike every hour, but the actual user experience is fine. What could be the cause?

A.The metric includes prediction time plus log writing time

B.The alert threshold is too low

C.The metric is being sampled every hour

D.A monitoring agent on the VM is causing additional latency

AnswerA

Periodic log dumping can cause hourly spikes in measured latency.

Why this answer

Option A is correct because Vertex AI Endpoints' latency metric includes both the model inference time and the time taken to write prediction logs to Cloud Logging. This log writing occurs asynchronously but can cause periodic spikes in the reported latency metric when log buffers flush, even though the actual user-facing prediction latency remains unaffected. The spike every hour aligns with log rotation or buffer flush intervals, not with actual prediction performance degradation.

Exam trap

Google Cloud often tests the misconception that latency metrics reflect only model inference time, when in reality they may include ancillary operations like logging, causing candidates to overlook the logging overhead as the source of periodic spikes.

How to eliminate wrong answers

Option B is wrong because the alert threshold being too low would cause continuous or frequent alerts, not a predictable hourly spike in the latency metric itself. Option C is wrong because sampling every hour would produce a single data point per hour, not a spike within the metric; the metric is reported continuously, and sampling frequency does not create spikes. Option D is wrong because a monitoring agent on the VM would add consistent overhead, not a periodic hourly spike, and Vertex AI Endpoints are managed services where customers do not manage VMs directly for prediction serving.

Full explanation →

453

MCQmedium

A company is training a large neural network on Vertex AI and training jobs keep failing with 'Out of memory' errors. The VM uses a standard n1-standard-4 machine with 15 GB RAM. Which action should they take first?

A.Use a larger machine type like n1-standard-16

B.Reduce the batch size in the training script

C.Enable distributed training across multiple VMs

D.Switch the training to CPU only

AnswerB

Smaller batch size reduces peak memory usage.

Why this answer

The 'Out of memory' error on a n1-standard-4 VM (15 GB RAM) indicates the model's memory footprint exceeds available RAM. Reducing the batch size directly decreases the memory required for storing intermediate activations and gradients during training, which is the most immediate and cost-effective fix without changing the underlying infrastructure.

Exam trap

The trap here is that candidates often jump to scaling up infrastructure (larger machine or distributed training) instead of first tuning the training hyperparameter (batch size) that directly controls memory consumption, which is the simplest and most cost-effective fix.

How to eliminate wrong answers

Option A is wrong because upgrading to a larger machine type (e.g., n1-standard-16) increases cost and may still fail if the memory issue is due to a batch size that is too large for the model; it does not address the root cause of memory pressure. Option C is wrong because enabling distributed training across multiple VMs introduces network overhead and synchronization complexity (e.g., using all-reduce with NCCL) and does not reduce per-VM memory consumption; it may even increase memory usage due to gradient accumulation buffers. Option D is wrong because switching to CPU only typically uses more memory for the same batch size (CPU memory is not the bottleneck here; GPU memory is not even mentioned, and the error is on a standard VM without GPUs), and it would dramatically slow training without solving the OOM issue.

Full explanation →

454

MCQhard

A financial services firm deploys a binary classification model for fraud detection. The model's precision is 0.95 and recall is 0.60 on the test set. After deployment, the fraud rate in production is 0.5% compared to 5% in the test set. The model shows good calibration on the test set (Brier score 0.02) but poor calibration in production (Brier score 0.15). What is the most likely explanation for the calibration degradation?

A.The distribution of input features has shifted significantly, causing the model to produce incorrect probabilities.

B.The model overfits to noise in the training data, leading to poor generalization.

C.The production data has a different class imbalance than the training data, causing the model to be biased toward the majority class.

D.The relationship between features and the target has changed (concept drift), causing the model's probability estimates to be misaligned with the true probabilities.

AnswerD

Concept drift changes the conditional distribution P(Y|X), which directly affects calibration.

Why this answer

The model's calibration degrades in production despite being well-calibrated on the test set, which had a 5% fraud rate, while production has a 0.5% fraud rate. This shift in class imbalance (prior probability shift) directly affects the model's probability estimates because the model's predicted probabilities are conditional on the training distribution. Option D is correct because concept drift—specifically a change in the base rate of fraud—causes the model's probability estimates to no longer reflect the true posterior probabilities in production, leading to a higher Brier score.

Exam trap

The trap here is that candidates confuse covariate shift (feature distribution change) with prior probability shift (class imbalance change), and incorrectly attribute calibration degradation to feature drift rather than the direct effect of base rate change on probability estimates.

How to eliminate wrong answers

Option A is wrong because input feature distribution shift (covariate shift) would primarily affect the model's feature space and could degrade calibration, but the core issue here is the change in class imbalance (prior probability shift), not feature distribution. Option B is wrong because overfitting to noise would manifest as poor performance on both test and production sets, but the model shows good calibration on the test set (Brier score 0.02), indicating it generalizes well to the test distribution. Option C is wrong because while the production data has a different class imbalance, the model is not necessarily biased toward the majority class; the degradation is due to the mismatch between the training prior and production prior, which directly skews probability estimates regardless of majority class bias.

Full explanation →

455

Multi-Selectmedium

Which TWO options are recommended practices for managing model versions across teams in Google Cloud?

Select 2 answers

A.Store all model files in a GitHub repository

B.Maintain a custom database to map model names to artifact locations

C.Use AI Platform (Unified) Models as the primary model registry

D.Use Vertex AI Model Registry to track model versions and their deployment history

E.Use Cloud Storage buckets with object versioning enabled to store model artifacts

AnswersD, E

Model Registry is the recommended service for managing model versions.

Why this answer

Vertex AI Model Registry is the recommended service for managing model versions across teams because it provides a centralized repository to track model versions, their associated metadata, and deployment history. It integrates natively with Vertex AI endpoints and pipelines, enabling consistent governance and lineage tracking across the ML lifecycle.

Exam trap

Google Cloud often tests the distinction between legacy AI Platform (Unified) Models and the current Vertex AI Model Registry, expecting candidates to recognize that the registry is the recommended service for version management and deployment history, not just a generic model storage location.

Full explanation →

456

MCQmedium

A retail company wants to build a customer churn prediction model using BigQuery ML. The data is stored in BigQuery tables and includes customer demographics, purchase history, and support interactions. The data scientist wants to experiment with different model types quickly without moving data to another environment. Which approach should they use?

A.Use Cloud Composer to orchestrate a custom training pipeline on Vertex AI.

B.Use AI Platform Notebooks with pandas and scikit-learn.

C.Use BigQuery ML to create and evaluate models directly in BigQuery.

D.Export the data to Cloud Storage and use Vertex AI AutoML Tables.

AnswerC

Why C is correct: BigQuery ML is a low-code solution that works directly on BigQuery data.

Why this answer

BigQuery ML (BQML) allows data scientists to create, train, and evaluate machine learning models directly in BigQuery using SQL, without moving data to another environment. This approach supports rapid experimentation with various model types (e.g., logistic regression, boosted trees, deep neural networks) and is ideal for the stated requirement of quick iteration while keeping data in place.

Exam trap

Google Cloud often tests the candidate's ability to recognize that BigQuery ML is purpose-built for low-code, in-database ML experimentation, and the trap here is assuming that more complex or external tools (like Vertex AI or Cloud Composer) are necessary when the simpler, integrated solution suffices.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is an orchestration tool for workflows, not a direct model training environment; using it to build a custom pipeline on Vertex AI would require moving data and add unnecessary complexity. Option B is wrong because AI Platform Notebooks with pandas and scikit-learn require exporting data from BigQuery to a Python environment, violating the requirement to keep data in BigQuery. Option D is wrong because exporting data to Cloud Storage for Vertex AI AutoML Tables introduces data movement and latency, contradicting the need for quick experimentation without moving data.

Full explanation →

457

MCQhard

An ML engineer is trying to upload a TensorFlow model to Vertex AI using the gcloud command shown. The model was trained using TensorFlow 2.11 and saved with model.save('model/'). The engineer sees the error. What is the most likely cause?

A.The container port should be 8080 instead of 8501.

B.The service account does not have permission to access the bucket.

C.The container image is for TensorFlow 2.11 but the model was saved with an older version.

D.The model was saved in a format other than SavedModel (e.g., HDF5) or the artifact path does not contain the expected directory structure.

AnswerD

The error explicitly states no saved_model.pb found, indicating the model is not in SavedModel format.

Why this answer

Option D is correct because the error indicates that Vertex AI cannot find the expected SavedModel artifacts (saved_model.pb and variables/ directory) at the specified path. When using model.save('model/') with TensorFlow 2.11, the default format is the SavedModel format, but the artifact path must point to the directory containing the saved_model.pb file, not a parent directory or a model saved in HDF5 format. The gcloud command likely references a path that does not contain the required SavedModel structure, causing the upload to fail.

Exam trap

Google Cloud often tests the distinction between SavedModel and HDF5 formats, and candidates mistakenly assume that any model.save() call produces a valid SavedModel, overlooking that the artifact path must point to the correct directory structure with saved_model.pb.

How to eliminate wrong answers

Option A is wrong because the container port 8501 is the default for TensorFlow Serving's REST API, and Vertex AI's prediction container for TensorFlow models typically uses port 8501 for HTTP requests; port 8080 is used for custom containers, not for standard TensorFlow Serving images. Option B is wrong because the error message in the question does not mention permissions or access to a bucket; a bucket permission issue would produce a 403 or 401 error, not a model format error. Option C is wrong because TensorFlow 2.11 is fully backward-compatible with SavedModels saved by older versions, and the container image for TensorFlow 2.11 can serve models saved with any earlier TensorFlow 2.x version without issue.

Full explanation →

458

MCQhard

An organization has multiple ML pipelines running on Vertex AI. They want to centralize monitoring and alerting for pipeline failures, including root cause analysis. Which combination of services should they use?

A.Cloud Trace + Cloud Debugger

B.Cloud Logging + Cloud Monitoring + Error Reporting

C.Cloud Operations for GKE + Stackdriver

D.Cloud Audit Logs + Cloud Functions

AnswerB

These services provide log aggregation, metrics, and error analysis for failures.

Why this answer

Option B is correct because Cloud Logging captures pipeline execution logs, Cloud Monitoring provides metrics and alerting on pipeline failures, and Error Reporting aggregates and analyzes errors with stack traces for root cause analysis. Together, they form a centralized observability stack that meets the requirement for monitoring, alerting, and root cause analysis of ML pipeline failures on Vertex AI.

Exam trap

The trap here is that candidates confuse Cloud Trace and Cloud Debugger (debugging tools) with the monitoring and logging services needed for failure detection and root cause analysis, or mistakenly think Cloud Audit Logs (compliance logs) are sufficient for pipeline error monitoring.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for latency analysis of distributed systems, not for monitoring pipeline failures or root cause analysis of errors, and Cloud Debugger inspects live application state without capturing historical failure data. Option C is wrong because Cloud Operations for GKE is specific to Google Kubernetes Engine workloads, not Vertex AI pipelines, and Stackdriver is the legacy name for what is now Cloud Operations, making this option outdated and misaligned with Vertex AI. Option D is wrong because Cloud Audit Logs record administrative actions and access logs, not pipeline execution errors or failures, and Cloud Functions alone cannot provide the centralized monitoring, alerting, and error analysis required.

Full explanation →

459

MCQhard

A team uses Vertex AI Feature Store to serve features for real-time predictions. They notice that feature values are frequently updated from multiple source systems, leading to inconsistencies. They need to ensure that feature values are consistent across all serving endpoints. What should they do?

A.Use batch ingestion with weekly updates to reduce update frequency

B.Increase the offline storage TTL to retain historical feature values

C.Implement a manual approval process for feature updates

D.Use a streaming ingestion pipeline with exactly-once semantics

AnswerD

Exactly-once streaming ensures each update is applied exactly once, maintaining consistency.

Why this answer

Option D is correct because streaming ingestion with exactly-once semantics ensures that each feature update is applied precisely once, preventing duplicates or missed updates that cause inconsistencies. This approach synchronizes feature values across all serving endpoints in near real-time, directly addressing the problem of frequent updates from multiple source systems.

Exam trap

The trap here is that candidates may confuse consistency with data freshness or retention, leading them to choose batch ingestion or TTL adjustments, when the core issue is update semantics in a distributed streaming context.

How to eliminate wrong answers

Option A is wrong because reducing update frequency with batch ingestion does not resolve inconsistencies from frequent updates; it merely delays them and can lead to stale features. Option B is wrong because increasing offline storage TTL retains historical values but does not affect consistency of current feature values across serving endpoints. Option C is wrong because a manual approval process introduces latency and is impractical for real-time predictions, and it does not guarantee consistency across distributed endpoints.

Full explanation →

460

Multi-Selecthard

A team is troubleshooting a Vertex AI Pipelines run that keeps failing at the model evaluation step. The pipeline includes steps: data preprocessing, training, evaluation, and deployment. Which THREE actions should they take to diagnose the issue?

Select 3 answers

A.Verify that the training step output is correctly linked as input to evaluation.

B.Run the evaluation code locally with the same input data.

C.Increase the memory of the evaluation step's machine.

D.Check the logs of the evaluation step in Cloud Logging.

E.Replace the evaluation step with a Vertex AI Model Evaluation service.

AnswersA, B, D

Mismatched outputs are a common pipeline failure cause.

Why this answer

Option A is correct because Vertex AI Pipelines relies on precise input/output artifact linking between steps. If the training step's output (e.g., a model artifact or evaluation metrics) is not correctly wired as the input to the evaluation step, the pipeline will fail due to missing or mismatched data. This is a common misconfiguration in Kubeflow Pipelines DSL, where step outputs must be explicitly passed as arguments to downstream components.

Exam trap

Google Cloud often tests the misconception that resource scaling (Option C) is the first diagnostic step for pipeline failures, when in reality, most failures in Vertex AI Pipelines stem from misconfigured artifact passing or code errors, not hardware limits.

Full explanation →

461

Matchingmedium

Match each ML pipeline component to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Production ML pipeline framework by Google

ML toolkit for Kubernetes-based workflows

Unified stream and batch data processing service

Managed Apache Airflow workflow orchestration

Serverless ML pipeline orchestration on Vertex AI

Why these pairings

Pipeline components are key for MLOps in Google Cloud.

Full explanation →

462

MCQeasy

A data scientist wants to share a trained model with colleagues for evaluation. The model is stored as a Vertex AI Model resource. What is the recommended way to share the model without exposing the underlying project?

A.Share the model ID and grant colleagues the 'vertex.ai.models.get' permission.

B.Create a new project and copy the model.

C.Upload the model to a public Cloud Storage bucket.

D.Export the model artifact and email it.

AnswerA

This provides secure, traceable access without exposing the project.

Why this answer

Option A is correct because Vertex AI Model resources are managed within a single Google Cloud project, and the recommended way to share a model without exposing the underlying project is to grant the IAM role 'roles/aiplatform.user' or the specific permission 'vertex.ai.models.get' to the colleagues' Google accounts. This allows them to access the model via the model ID (a fully qualified resource name like 'projects/{project}/locations/{region}/models/{model}') without needing to copy or expose the project's infrastructure or credentials.

Exam trap

Google Cloud often tests the misconception that sharing a model requires copying or exporting the artifact, when in fact IAM-based access control on the managed resource is the secure and recommended approach.

How to eliminate wrong answers

Option B is wrong because creating a new project and copying the model is unnecessary overhead and still exposes the model artifact to another project, which does not inherently prevent exposure of the original project's identity; it also violates the principle of least privilege by duplicating resources. Option C is wrong because uploading the model to a public Cloud Storage bucket would expose the model artifact to the entire internet, violating security best practices and potentially leaking proprietary data; it also bypasses Vertex AI's access control mechanisms. Option D is wrong because exporting the model artifact and emailing it is insecure, as email is not encrypted at rest by default and exposes the model to unauthorized interception; it also loses the managed model resource's metadata and versioning.

Full explanation →

463

MCQeasy

Refer to the exhibit. A team deploys a model using Cloud Run. They notice that after scaling up, the new instances take about 90 seconds to become ready and serve requests. They want to reduce this startup time. Which configuration change is most likely to help?

A.Reduce the startupProbe initialDelaySeconds to 30

B.Change the container image to use a smaller base image

C.Reduce the memory limit to 4Gi

D.Increase the containerConcurrency to 100

AnswerB

A smaller base image reduces download and extraction time, speeding up startup.

Why this answer

Option D is correct. Using a smaller container image (e.g., a minimal base image) reduces pull and initialization time, directly lowering startup latency. Option A increases concurrency but doesn't affect startup.

Option B reduces the probe delay but the instance may not be ready earlier. Option C reduces memory but could cause OOM if model requires more.

Full explanation →

464

MCQmedium

An e-commerce company uses a recommendation model deployed on Vertex AI Endpoints. The model's latency increases gradually over two weeks, causing timeouts. The model is served using a custom container. What is the most likely root cause and corrective action?

A.The model is receiving more traffic; scale the number of replicas.

B.The custom container has a memory leak; implement memory monitoring and set container resource limits.

C.The Vertex AI endpoint has changed its URL; update the client application.

D.The model file size has grown due to feature engineering; reduce feature count.

AnswerB

Memory leaks are a common cause of gradual performance degradation in long-running containers.

Why this answer

A gradual increase in latency over two weeks, without a sudden spike, strongly indicates a memory leak in the custom container. As the leak accumulates, the container's garbage collection becomes less effective, leading to increased GC pauses and eventual timeouts. Setting resource limits and monitoring memory usage can prevent the container from exhausting host memory and causing performance degradation.

Exam trap

The trap here is that candidates confuse a gradual latency increase with a traffic scaling issue (Option A), but the slow, steady degradation over weeks is the hallmark of a resource leak, not a demand spike.

How to eliminate wrong answers

Option A is wrong because a gradual latency increase over two weeks is not characteristic of a sudden traffic surge; traffic spikes would cause immediate latency jumps, not a slow creep. Option C is wrong because Vertex AI endpoint URLs are stable and do not change over time; a URL change would cause immediate 404 errors, not gradual latency increases. Option D is wrong because model file size does not change dynamically during serving; feature engineering changes would require a new model deployment, not cause a gradual latency increase in an already deployed model.

Full explanation →

465

Matchingmedium

Match each Google Cloud storage option to its best use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Unstructured object storage for any type of data

NoSQL wide-column database for low-latency, high-throughput

Serverless data warehouse for analytics at scale

Relational database for OLTP workloads

NoSQL document database for mobile/web apps

Why these pairings

Choosing the right storage is crucial for ML data pipelines.

Full explanation →

466

MCQeasy

A data science team is deploying a large NLP model to Vertex AI for real-time inference. They notice high latency per request. Which action should they take first to reduce latency?

A.Use Cloud Functions for inference.

B.Use model optimization techniques like quantization or pruning.

C.Use Vertex AI Model Optimization to quantize the model and deploy on a smaller machine.

D.Enable autoscaling and set min replicas to 5.

E.Implement batch prediction instead of online prediction.

AnswerC

Quantization reduces model size and latency directly.

Why this answer

Option C is correct because it directly addresses the root cause of high latency in real-time inference: model size and compute requirements. Vertex AI Model Optimization applies quantization or pruning to reduce the model's memory footprint and computational cost, allowing it to run on a smaller, faster machine (e.g., fewer vCPUs or less GPU memory) while maintaining acceptable accuracy. This is the first step recommended by Google Cloud best practices for latency-sensitive deployments, as it reduces per-request processing time without requiring architectural changes.

Exam trap

Google Cloud often tests the misconception that scaling out (autoscaling) or switching to batch processing is the first step to reduce latency, when in fact model optimization and hardware matching are the primary levers for per-request performance in real-time inference.

How to eliminate wrong answers

Option A is wrong because Cloud Functions are stateless, short-lived compute units with a maximum timeout of 9 minutes and limited GPU support, making them unsuitable for hosting large NLP models for real-time inference; they introduce cold-start latency and lack the persistent infrastructure needed for model serving. Option B is wrong because it suggests using model optimization techniques like quantization or pruning but omits the critical step of deploying on a smaller machine; without adjusting the underlying hardware, the latency reduction from optimization alone may be insufficient, and the question asks for the first action to take. Option D is wrong because enabling autoscaling with a minimum of 5 replicas increases resource availability but does not reduce per-request latency; it may even increase cost and complexity without addressing the model's inference speed.

Option E is wrong because batch prediction is designed for asynchronous, high-throughput processing of large datasets, not for real-time inference; it introduces higher latency per request due to queuing and batching overhead, making it counterproductive for reducing latency in a real-time scenario.

Full explanation →

467

MCQhard

A company is using Vertex AI Pipelines with reusable components. They observe that a component that performs hyperparameter tuning is failing intermittently with a 'ResourceExhausted' error. The component is configured with a small custom service account. What is the most likely cause?

A.The component code has a bug causing infinite recursion

B.The KFP executor is not properly configured

C.The service account does not have sufficient quotas or permissions to create the required number of trials or workers

D.The pipeline system memory is insufficient for the component

AnswerC

Hyperparameter tuning often spawns multiple trial jobs; quota limits on AI Platform training jobs or compute resources can cause this error.

Why this answer

The 'ResourceExhausted' error in Vertex AI Pipelines typically indicates that the component is trying to create more resources (e.g., trials or workers for hyperparameter tuning) than allowed by the assigned service account's quotas or permissions. A small custom service account often has restricted quotas for AI Platform services, such as the number of concurrent trials or training workers, leading to this failure.

Exam trap

Google Cloud often tests the misconception that 'ResourceExhausted' errors are always due to memory or code bugs, rather than understanding that Vertex AI enforces service-account-specific quotas for hyperparameter tuning resources.

How to eliminate wrong answers

Option A is wrong because infinite recursion would cause a stack overflow or timeout error, not a 'ResourceExhausted' error specific to resource quotas. Option B is wrong because the KFP executor is a generic pipeline runner; its configuration does not directly affect resource creation quotas for hyperparameter tuning jobs. Option D is wrong because pipeline system memory is a cluster-level resource, not the cause of a 'ResourceExhausted' error tied to service account quotas for creating trials or workers.

Full explanation →

468

Multi-Selecteasy

Refer to the exhibit. A data scientist is evaluating a binary classification model trained with BigQuery ML on an imbalanced dataset. The exhibit shows the output of ML.EVALUATE run on two different thresholds. Which TWO actions should the data scientist take to improve model performance? (Choose two.)

Select 2 answers

A.Add more features from the source data.

B.Use AUC-ROC as the evaluation metric instead of accuracy.

C.Apply SMOTE oversampling in the preprocessing pipeline.

D.Use class weights in the CREATE MODEL statement.

E.Increase the number of training iterations.

AnswersB, D

AUC-ROC is robust to class imbalance and provides a better measure of model discrimination.

Why this answer

Option B is correct because AUC-ROC is insensitive to class imbalance and evaluates the model's ability to rank positive instances higher than negative ones across all thresholds, unlike accuracy which can be misleading when the majority class dominates. In BigQuery ML, ML.EVALUATE returns metrics like accuracy, precision, recall, and AUC-ROC; for imbalanced datasets, AUC-ROC provides a more reliable measure of discriminative power.

Exam trap

Google Cloud often tests the misconception that adding more data or features is a universal fix for imbalanced datasets, when in fact the core issue requires adjustments to the evaluation metric or the loss function (e.g., class weights) rather than simply increasing data volume or iterations.

Full explanation →

469

MCQmedium

A team uses Vertex AI Prediction with a custom container. They want to perform canary deployments by sending 5% of traffic to a new model version. Which method should they use?

A.Create a new endpoint with manual traffic splitting

B.Deploy two separate endpoints and use a load balancer

C.Use Cloud Run for serving with gradual rollout

D.Use the Vertex AI Model Registry and configure traffic splitting on the endpoint

AnswerB

This is correct because you can deploy the new model version to the same endpoint with a small traffic split (e.g., 5%) using the traffic splitting feature.

Why this answer

Option C is correct because Vertex AI endpoints support traffic splitting between deployed models, allowing a controlled canary rollout. Option A is not possible as endpoints cannot have separate traffic splits on different deployments without manual configuration. Option B is incorrect as Model Registry itself does not handle traffic splitting.

Option D uses Cloud Run which is not integrated with Vertex AI Prediction.

Full explanation →

470

MCQmedium

An ML engineer is scaling a prototype to production using Vertex AI Pipelines. The pipeline includes data validation, preprocessing, training, and deployment steps. They want to ensure that the pipeline can be reproduced and audited. What is the best practice?

A.Define the pipeline using Kubeflow Pipelines SDK and run it on Vertex AI Pipelines.

B.Use a Docker container with fixed tags and manually record runs.

C.Store all data and models in a single Cloud Storage bucket with no versioning.

D.Pin all library versions in a requirements.txt file.

AnswerA

Vertex AI Pipelines automatically tracks artifacts, parameters, and lineage.

Why this answer

Using a fully managed pipeline service like Vertex AI Pipelines automatically tracks artifacts, parameters, and lineage, ensuring reproducibility and auditability. Option A is not a service; Option B is about environment consistency but does not provide built-in tracking. Option D is about dependencies but not the pipeline orchestration.

Full explanation →

471

MCQhard

Refer to the exhibit. A data scientist trained a BigQuery ML classification model to detect fraudulent transactions. The dataset has 95% non-fraud (class 0) and 5% fraud (class 1). The evaluation metrics show high accuracy (0.91) but low recall (0.60) for fraud detection. Which low-code approach should the data scientist take to improve recall without significantly sacrificing precision?

A.Use the ML.PREDICT function with a lower classification threshold (e.g., 0.3 instead of 0.5) to capture more positive cases.

B.Apply feature selection to reduce the number of features and focus on the most predictive ones.

C.Increase the number of training iterations by setting the MAX_ITERATIONS option to a higher value.

D.Re-train the model using AutoML Tables with class weights to penalize false negatives more heavily.

AnswerA

Lowering the threshold increases recall by classifying more instances as positive.

Why this answer

Option A is correct because lowering the classification threshold in ML.PREDICT (e.g., from 0.5 to 0.3) causes the model to classify more transactions as fraud, directly increasing recall. This is a low-code adjustment that does not require retraining or complex feature engineering, and it allows the data scientist to trade off precision for recall as needed.

Exam trap

Google Cloud often tests the misconception that improving recall always requires retraining or complex model changes, when in fact a simple threshold adjustment in ML.PREDICT is a valid low-code technique to shift the precision-recall balance.

How to eliminate wrong answers

Option B is wrong because feature selection reduces the number of input features, which may improve training speed or reduce overfitting but does not directly increase recall for a specific class; it can even harm recall if important fraud-indicative features are removed. Option C is wrong because increasing MAX_ITERATIONS only affects the convergence of the training algorithm; if the model is already converged, more iterations will not improve recall and may lead to overfitting. Option D is wrong because AutoML Tables is a separate service, not a low-code approach within BigQuery ML; while class weights can help, this option requires moving to a different platform and is not the simplest low-code fix described in the question.

Full explanation →

472

MCQhard

A large financial company uses a complex ML pipeline to detect fraudulent transactions. The pipeline consists of multiple steps: data ingestion from Pub/Sub, feature engineering using Dataflow, model training with Vertex AI, and deployment to an endpoint. They currently use Cloud Composer to orchestrate the pipeline with separate DAGs for each step. Recently, they have been experiencing failures in the Dataflow job due to schema changes in the incoming transactions, causing the pipeline to stall. The team manually fixes the schema and re-runs the pipeline, which is time-consuming. They want to improve the robustness of the pipeline. The pipeline is run on a schedule but also triggered by the arrival of new data. The team is considering moving to Vertex AI Pipelines to unify the workflow. They also want to automatically detect schema changes and handle them without manual intervention. Which approach should they take?

A.Keep using Cloud Composer but add retries with exponential backoff to the Dataflow task, and set up a Cloud Monitoring alert to notify the team if the task fails repeatedly

B.Migrate to Vertex AI Pipelines and add a pre-processing step that validates incoming data schema against a schema registry; if schema change is detected, the pipeline sends an alert and uses a default schema to continue processing

C.Use Cloud Scheduler to trigger the pipeline more frequently to reduce the impact of failures

D.Create a separate Dataflow pipeline to handle schema detection and run it before the main pipeline; if schema changes, send an email to the team

AnswerB

This provides automated handling of schema changes.

Why this answer

Option B is correct because it directly addresses the need for automated schema change detection and handling within a unified orchestration framework. By migrating to Vertex AI Pipelines, the team gains a managed, end-to-end ML workflow service that can include a pre-processing step to validate incoming data against a schema registry. When a schema change is detected, the pipeline can automatically apply a default schema and continue, eliminating manual intervention and reducing downtime.

Exam trap

The trap here is that candidates often think retries or alerts (Option A) are sufficient for handling failures, but the question explicitly requires automatic handling without manual intervention, which only a schema validation and fallback step can provide.

How to eliminate wrong answers

Option A is wrong because adding retries with exponential backoff does not solve the root cause of schema changes; it only retries the same failing operation, which will continue to fail until the schema is manually fixed, and Cloud Monitoring alerts still require manual intervention. Option C is wrong because increasing the frequency of pipeline runs via Cloud Scheduler does not address schema change failures; it would only cause more frequent failures and waste resources. Option D is wrong because creating a separate Dataflow pipeline for schema detection still requires manual email notification and manual re-run, and it does not integrate automated handling or a unified workflow like Vertex AI Pipelines provides.

Full explanation →

473

Multi-Selecteasy

A company wants to use pre-built Google Cloud APIs for text analysis. Which TWO APIs can they use? (Choose TWO.)

Select 2 answers

A.Cloud Natural Language API

B.Cloud Translation API

C.Cloud Vision API

D.Video Intelligence API

E.Document AI

AnswersA, B

For text analysis.

Why this answer

The Cloud Natural Language API provides pre-built machine learning models for text analysis tasks such as entity recognition, sentiment analysis, and syntax analysis. The Cloud Translation API can translate text between languages, which is a form of text analysis. Both are pre-built Google Cloud APIs that directly address the company's need for text analysis without requiring custom model training.

Exam trap

The trap here is that candidates may confuse Document AI with a general text analysis API, but Document AI is specifically for document parsing and OCR, not for core NLP tasks like sentiment or entity extraction, which are the focus of the Cloud Natural Language API.

Full explanation →

474

MCQhard

A healthcare startup is building a diagnostic tool that uses a deep learning model to classify medical images. The model is trained on TensorFlow and deployed on Vertex AI Prediction. The startup has strict latency requirements: predictions must return within 200 ms for 95% of requests. Current performance shows p95 latency of 350 ms. The team has already tried using a smaller model, but accuracy dropped below acceptable levels. The traffic pattern is spiky: low load during nights but bursts of 1000 requests per second during business hours. Currently, they use a single n1-highmem-8 VM with a GPU attached. They have a budget for additional resources but need to optimize cost. The model is about 500 MB and requires GPU for inference. Which course of action should they take to meet the latency requirement while managing costs?

A.Upgrade to an n1-highmem-16 VM with a more powerful GPU

B.Switch to batch prediction using Vertex AI Batch Prediction and store results in a database for retrieval

C.Create a Vertex AI Prediction endpoint with an accelerator (GPU) and enable autoscaling (min 1, max 5 nodes)

D.Deploy the model as a Cloud Function using TensorFlow Serving

AnswerC

Autoscaling with GPU provides low latency during bursts and cost efficiency by scaling down during low load.

Why this answer

Option C is correct because it leverages Vertex AI Prediction's autoscaling to handle spiky traffic efficiently, using GPU-accelerated endpoints that can scale from 1 to 5 nodes to meet the 200 ms p95 latency requirement. This approach minimizes cost during low-load periods while providing burst capacity for the 1000 requests per second peak, addressing both the latency and budget constraints without compromising model accuracy.

Exam trap

The trap here is that candidates often choose a single-node upgrade (Option A) thinking more power solves latency, but they overlook the need for horizontal scaling to handle spiky traffic, while Option B seems cost-effective but ignores the real-time requirement, and Option D appears serverless but fails due to GPU and timeout limitations.

How to eliminate wrong answers

Option A is wrong because upgrading to a more powerful VM (n1-highmem-16 with a better GPU) does not solve the spiky traffic pattern; it increases cost during low-load periods and still risks latency spikes during bursts due to a single-node bottleneck. Option B is wrong because batch prediction is asynchronous and not suitable for real-time diagnostic tools requiring sub-200 ms responses; storing results in a database for retrieval introduces additional latency and cannot meet the strict p95 latency requirement. Option D is wrong because Cloud Functions have a maximum timeout of 540 seconds and do not natively support GPU acceleration, making them incapable of running a 500 MB deep learning model with GPU inference within the latency constraint.

Full explanation →

475

Multi-Selectmedium

Which TWO of the following are benefits of using BigQuery ML for low-code model development?

Select 2 answers

A.Train models directly on data in BigQuery without moving it

B.Automatic feature engineering and hyperparameter tuning

C.Automatic scaling to petabytes of data

D.Built-in model explainability for all model types

E.Support for image classification tasks

AnswersA, C

Data stays in BigQuery, eliminating ETL.

Why this answer

Option A is correct because BigQuery ML allows you to train machine learning models using SQL directly on data stored in BigQuery, eliminating the need to export or move data to a separate environment. This reduces data transfer latency, simplifies security governance, and leverages BigQuery's native storage and compute separation.

Exam trap

Google Cloud often tests the misconception that 'low-code' means 'fully automated' — candidates mistakenly assume BigQuery ML handles feature engineering and hyperparameter tuning automatically, when in fact it only reduces coding effort for model creation, not for data preparation or optimization.

Full explanation →

476

Multi-Selectmedium

A model serving team is experiencing high latency in production. Which TWO actions should they take to diagnose the root cause? (Choose 2.)

Select 2 answers

A.Convert the model to a different framework that is faster.

B.Enable Cloud Trace to analyze request latency across services.

C.Check the endpoint's autoscaling metrics and cold start frequency.

D.Increase the number of replicas to reduce load per replica.

E.Set the logging verbosity to DEBUG in the container.

AnswersB, C

Cloud Trace provides detailed latency breakdowns.

Why this answer

Options A and D are correct. Option B is wrong because increasing replicas may mask the issue but not diagnose. Option C is wrong because converting framework may not address latency.

Option E is wrong because log level changes do not provide granular latency analysis.

Full explanation →

477

MCQmedium

A model deployed on Vertex AI Prediction is returning high latency for real-time requests. The model is a small TensorFlow model. Which troubleshooting step should the team take first?

A.Retrain the model with a larger batch size

B.Check if the machine type is too small and enable autoscaling

C.Use a custom container with optimized runtime

D.Enable Cloud Armor to reduce traffic

AnswerB

Low latency often requires adequate resources.

Why this answer

Option B is correct because high latency for real-time predictions from a small TensorFlow model often indicates that the serving infrastructure is under-provisioned. Checking the machine type and enabling autoscaling directly addresses whether the instance is too small to handle the request volume, which is the most common first step in diagnosing latency issues on Vertex AI Prediction.

Exam trap

Google Cloud often tests the principle of 'start with the simplest infrastructure fix before optimizing the model or container,' so candidates mistakenly jump to retraining or custom containers without first checking if the instance type and scaling settings are appropriate.

How to eliminate wrong answers

Option A is wrong because retraining with a larger batch size affects training throughput, not inference latency for real-time requests; inference batch size is set at serving time, not during training. Option C is wrong because using a custom container with an optimized runtime is a more advanced optimization step that should be considered only after verifying that the base infrastructure (machine type and scaling) is adequate. Option D is wrong because Cloud Armor is a security service for DDoS protection and traffic filtering, not a tool for reducing latency caused by insufficient compute resources.

Full explanation →

478

MCQmedium

A financial services company uses a custom container to serve a fraud detection model on Vertex AI Endpoints. The model requires a feature store lookup for each prediction. Recently, the feature store (Cloud Bigtable) experienced a brief outage, causing some predictions to fail. After the outage resolved, the endpoint's CPU utilization dropped significantly, and prediction latency improved. However, the model's false positive rate increased sharply. The ML engineer suspects the model is using stale features because the feature store outage caused missing lookups. Cloud Monitoring for the endpoint shows no errors after the outage, but the number of feature store read requests per prediction decreased by 30%. Which metric should the engineer use to confirm the hypothesis of stale features?

A.Monitor the prediction request latency to see if it remains low.

B.Use Vertex AI Model Monitoring to compare the prediction distribution before and after the outage; significant drift indicates stale features.

C.Verify the feature store's read throughput and latency metrics to ensure it is healthy.

D.Check the error rate for the endpoint; if no errors, then features were retrieved correctly.

AnswerB

Drift detection directly reveals changes in model behavior due to input changes.

Why this answer

Option B is correct because Vertex AI Model Monitoring can detect prediction distribution drift, which directly indicates that the model is receiving different input features than expected. A significant drift after the outage, combined with the 30% drop in feature store read requests, confirms that stale or default features were substituted for missing lookups, causing the false positive rate to spike.

Exam trap

The trap here is that candidates assume no errors means no problem, but the question explicitly describes a silent failure where the model uses stale features without raising any error, so metrics like latency or error rate are irrelevant for detecting feature staleness.

How to eliminate wrong answers

Option A is wrong because low prediction latency does not confirm stale features; it only indicates that the endpoint is processing requests faster, which could be due to fewer feature store reads (as observed) but does not prove that the features used are stale. Option C is wrong because verifying the feature store's health metrics (read throughput, latency) only confirms that Bigtable is operational now, not whether the model used stale features during the outage or after. Option D is wrong because the absence of endpoint errors does not guarantee correct feature retrieval; the model can silently use default or cached values without raising errors, which is exactly what happened here.

Full explanation →

479

MCQhard

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

A.The model is stored in a regional bucket and the Vertex AI endpoint is in a different region.

B.The machine type does not support TensorFlow models larger than 1 GB.

C.The model is too large for the machine's memory, causing an out-of-memory (OOM) error during loading.

D.The model file is corrupted or missing dependencies, causing a crash.

AnswerC

The 2 GB model may require more than 15 GB RAM during loading due to overhead and intermediate structures.

Why this answer

Option C is correct because the model artifact is 2 GB, and loading it into memory on an n1-standard-4 machine (15 GB RAM) can still cause a MemoryError. TensorFlow models often require additional memory for graph construction, intermediate tensors, and framework overhead, which can easily exceed the available RAM, especially when the model is loaded entirely into memory before serving.

Exam trap

Google Cloud often tests the misconception that model file size must be less than total machine RAM to avoid OOM errors, but the trap here is that TensorFlow's memory footprint during loading and serving is significantly larger than the artifact size due to framework overhead and graph construction.

How to eliminate wrong answers

Option A is wrong because a regional bucket mismatch would cause a permission or access error, not a MemoryError; Vertex AI can access models from any regional bucket as long as the service account has proper permissions. Option B is wrong because there is no inherent machine type limitation that restricts TensorFlow models to 1 GB; the n1-standard-4 can handle larger models if sufficient memory is available. Option D is wrong because a corrupted file or missing dependencies would typically result in an ImportError or a crash with a different error message, not a MemoryError.

Full explanation →

480

Multi-Selectmedium

A data science team uses Vertex AI Model Monitoring to detect data quality issues in a production model. Which TWO metrics should they enable to identify problems with missing values in predictions? (Select TWO.)

Select 2 answers

A.Feature value distribution skew (distance metrics).

B.Training-serving skew detection for all features.

C.Total count of missing values across all features.

D.Prediction confidence score.

E.Missing value ratio per feature.

AnswersA, E

Can detect shifts due to missing values being treated differently.

Why this answer

Option A is correct because Vertex AI Model Monitoring's feature value distribution skew detection uses distance metrics (e.g., Jenson-Shannon divergence, L-infinity) to compare the distribution of feature values in the serving data against the training data. A sudden increase in missing values in a feature will shift its distribution, triggering a skew alert. This allows the team to detect missing value problems indirectly by monitoring distributional drift.

Exam trap

Google Cloud often tests the distinction between aggregate metrics (like total count) and per-feature metrics (like ratio), and candidates mistakenly select 'total count of missing values across all features' because they think it directly addresses missing values, but Vertex AI Model Monitoring only supports per-feature missing value ratios.

Full explanation →

481

MCQmedium

An ML engineer notices that predictions are taking longer than expected under moderate traffic. Reviewing the endpoint configuration, what is the most likely cause of the high latency?

A.Container logging is disabled, slowing down request processing.

B.The accelerator count is 0, meaning no GPU is used.

C.The machine type n1-standard-4 is underpowered for the model's compute needs.

D.Automatic scaling is set with a maxReplicaCount of 10, which creates overhead.

AnswerB

BERT models are computationally intensive and benefit greatly from GPU acceleration; without it, inference is CPU-bound and slow.

Why this answer

When the accelerator count is set to 0, the endpoint runs inference on the CPU only, which is significantly slower than GPU-accelerated inference for deep learning models. This is the most direct cause of high latency under moderate traffic, as the model's compute demands exceed CPU throughput.

Exam trap

Google Cloud often tests the misconception that CPU machine type is the primary cause of latency, when in fact the accelerator count being zero is the more direct and common misconfiguration for deep learning models.

How to eliminate wrong answers

Option A is wrong because disabling container logging reduces I/O overhead and actually speeds up request processing, not slows it down. Option C is wrong because n1-standard-4 (4 vCPUs, 15 GB RAM) is a standard compute-optimized machine type that is generally sufficient for moderate traffic; the primary bottleneck is the lack of GPU acceleration, not CPU underpowering. Option D is wrong because a maxReplicaCount of 10 does not create overhead; automatic scaling with a higher maxReplicaCount allows more instances to handle load, reducing latency under traffic.

Full explanation →

482

MCQhard

A company has deployed a machine learning model that uses a large input tensor. They notice that the prediction latency varies significantly between requests of the same size. Cloud Monitoring shows that the serving endpoint's CPU utilization is consistently below 50%, but memory utilization fluctuates between 70% and 95%. What is the most likely cause?

A.The model is performing garbage collection cycles

B.The model is using excessive memory due to a memory leak

C.The prediction latency is being affected by CPU throttling

D.The model is hitting a cold start due to autoscaling

AnswerA

Garbage collection pauses can cause latency spikes without high CPU usage, as memory utilization fluctuates during GC.

Why this answer

The correct answer is A because the described symptoms—low CPU utilization (below 50%) and high, fluctuating memory utilization (70%–95%) with variable latency—are classic indicators of garbage collection (GC) pauses in a managed runtime like Python or Java. When the model processes large input tensors, it allocates significant memory; as memory pressure builds, the garbage collector runs more frequently, causing stop-the-world pauses that increase latency unpredictably, even though CPU is not fully utilized.

Exam trap

Google Cloud often tests the misconception that high memory utilization always indicates a memory leak, but the key differentiator is the pattern of fluctuation versus monotonic increase, and the fact that GC pauses cause latency spikes without high CPU usage.

How to eliminate wrong answers

Option B is wrong because a memory leak would cause memory utilization to steadily increase over time (monotonically) rather than fluctuate between 70% and 95%, and it would eventually lead to an out-of-memory crash, not just variable latency. Option C is wrong because CPU throttling (e.g., due to thermal limits or cloud provider CPU credits exhaustion) would manifest as sustained high CPU utilization or a hard cap on CPU speed, not consistently below 50% utilization. Option D is wrong because cold starts due to autoscaling occur when new instances are spun up to handle increased load, which would show a correlation with request volume spikes and initial high latency on the first request, not persistent latency variation across all requests of the same size.

Full explanation →

483

MCQeasy

For a low-latency real-time serving requirement, which type of Vertex AI Endpoint is appropriate?

A.Regional endpoint

B.Public endpoint

C.Private endpoint with VPC network

D.Global endpoint

AnswerA

Regional endpoints are deployed in a specific region, allowing proximity to clients for low latency.

Why this answer

Option C is correct because a regional endpoint can be deployed in the same region as the clients to minimize network latency, and it provides low-latency serving. Option A (private endpoint) is for security, not necessarily low latency. Option B (public endpoint) adds internet latency.

Option D (global endpoint) is optimized for multi-region traffic but may add slight overhead.

Full explanation →

484

Multi-Selectmedium

A company uses Vertex AI Model Monitoring. Which two configuration options can be set to reduce false positive drift alerts?

Select 2 answers

A.Use a sample percentage of predictions

B.Set a shorter alerting window

C.Increase the drift threshold

D.Decrease the drift threshold

E.Enable feature attribution monitoring

AnswersA, C

Sampling reduces the volume of data compared, potentially reducing noise-induced false alarms.

Why this answer

Option A is correct because using a sample percentage of predictions reduces the volume of data analyzed for drift, which lowers the chance of detecting statistically insignificant fluctuations that could trigger false positive alerts. This is a common technique to filter out noise in high-throughput production systems.

Exam trap

Google Cloud often tests the misconception that increasing sensitivity (lowering thresholds or shortening windows) reduces false positives, when in fact the opposite is true—these actions increase alert volume and false positives.

Full explanation →

485

Matchingmedium

Match each MLOps practice to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Continuous integration and deployment for ML pipelines

Track and manage different model iterations

Monitor for changes in data or model performance over time

Schedule or trigger model retraining based on conditions

Compare model versions in production with traffic splitting

Why these pairings

MLOps ensures reliable and maintainable ML systems.

Full explanation →

486

Multi-Selecthard

Which TWO actions are recommended to detect and mitigate data drift in a production ML system on Vertex AI?

Select 2 answers

A.Deploy multiple models and use an ensemble to average predictions

B.Manually review model predictions daily

C.Automatically retrain the model when drift exceeds thresholds

D.Set up Vertex AI Model Monitoring to alert on feature distribution changes

E.Monitor prediction errors and flag when confidence is low

AnswersC, D

Automated retraining mitigates drift.

Why this answer

Option C is correct because Vertex AI's automated retraining pipeline can be triggered when data drift exceeds a predefined threshold, ensuring the model adapts to distribution changes without manual intervention. Option D is correct because Vertex AI Model Monitoring continuously tracks feature distribution statistics (e.g., using Jensen-Shannon divergence or L-infinity distance) and sends alerts when drift is detected, enabling proactive mitigation.

Exam trap

Google Cloud often tests the distinction between drift detection (monitoring input distributions) and model performance monitoring (tracking prediction errors or confidence), leading candidates to confuse E with a valid drift mitigation technique.

Full explanation →

487

MCQmedium

An ML engineer is using Vertex AI Pipelines and wants to reuse a trained model across multiple pipeline runs without retraining each time. Which artifact management strategy should be used?

A.Store the model in BigQuery as a ML model

B.Use Cloud Functions to cache the model

C.Save the model to a Cloud Storage bucket and reference by path

D.Use Vertex AI ML Metadata to track and retrieve model artifacts

AnswerD

ML Metadata provides lineage and artifact tracking, enabling efficient reuse across pipelines.

Why this answer

Vertex AI ML Metadata is the correct artifact management strategy because it is purpose-built for tracking and retrieving model artifacts across pipeline runs. It stores metadata about models, datasets, and other artifacts in a lineage graph, enabling you to query and reuse a specific model version without retraining. This integrates natively with Vertex AI Pipelines, allowing you to pass model artifacts between components and retrieve them by ID or custom properties.

Exam trap

Google Cloud often tests the misconception that simply saving a model to Cloud Storage (Option C) is sufficient for artifact management, but the trap is that it ignores the need for metadata tracking, version lineage, and automated retrieval—features that Vertex AI ML Metadata provides as a managed service.

How to eliminate wrong answers

Option A is wrong because BigQuery is a data warehouse for structured data, not an artifact store for ML models; storing a model in BigQuery as an ML model (e.g., CREATE MODEL) is for in-database inference, not for retrieving a trained model artifact across pipelines. Option B is wrong because Cloud Functions are event-driven compute services, not a caching mechanism for model artifacts; they lack persistent storage and artifact versioning, and using them to cache models would be inefficient and unscalable. Option C is wrong because while saving a model to Cloud Storage and referencing by path is a common pattern, it is not a managed artifact management strategy—it lacks metadata tracking, version lineage, and automatic retrieval capabilities that Vertex AI ML Metadata provides, making it error-prone for reuse across multiple pipeline runs.

Full explanation →

488

MCQhard

A data engineer is troubleshooting a Vertex AI Endpoint that serves a large BERT model. After deployment, many prediction requests fail with 'Out of Memory' errors. The machine type is n1-standard-8 (30 GB memory) with no accelerator. Which action will most likely resolve the issue?

A.Change the machine type to n1-highmem-16 (104 GB memory).

B.Use batch prediction instead of online prediction.

C.Add a GPU accelerator (e.g., NVIDIA T4) to offload computation.

D.Quantize the model from FP32 to INT8.

AnswerA

Increasing memory directly resolves out-of-memory errors.

Why this answer

Option C is correct because n1-standard-8 has only 30 GB memory, which may be insufficient for a large BERT model (e.g., around 1.5 GB parameters but with intermediate tensors can exceed 30 GB). Upgrading to a high-memory machine type provides more memory. Option A is wrong because adding a GPU does not increase system memory.

Option B is wrong because model quantization reduces model size but not necessarily memory spikes during inference. Option D is wrong because batch prediction is not for real-time, and OOM might still occur.

Full explanation →

489

MCQmedium

A data scientist deployed a classification model on Vertex AI Endpoints. After a week, the model's accuracy drops significantly from 92% to 78%. The data scientist suspects training-serving skew. What is the first step to confirm this?

A.Look for data leakage in the training pipeline

B.Compare feature distributions between training and serving data using Vertex AI Model Monitoring

C.Examine the feature importance of the model

D.Check the prediction confidence over time

AnswerB

Model Monitoring can detect skew by comparing distributions.

Why this answer

Option B is correct because Vertex AI Model Monitoring provides a built-in capability to automatically detect training-serving skew by comparing feature distributions between the training data and the live serving data. This is the most direct and efficient first step to confirm whether the accuracy drop is due to a shift in the input data distribution, which is the hallmark of training-serving skew. The data scientist can set up monitoring jobs that compute statistical distance metrics (e.g., Jensen-Shannon divergence) and alert when significant deviations occur.

Exam trap

Google Cloud often tests the distinction between diagnosing the root cause of a performance drop versus investigating a specific type of issue; the trap here is that candidates may jump to data leakage (Option A) because it sounds similar to skew, but leakage is a pre-deployment problem, not a post-deployment distribution shift.

How to eliminate wrong answers

Option A is wrong because looking for data leakage in the training pipeline addresses a different problem—where the model inadvertently uses information from the future or target during training—not a post-deployment distribution shift between training and serving data. Option C is wrong because examining feature importance helps understand which features drive predictions but does not directly compare training and serving distributions to confirm skew. Option D is wrong because checking prediction confidence over time can indicate model uncertainty but does not isolate whether the cause is a change in input data distribution versus model drift or other issues.

Full explanation →

490

Multi-Selecthard

A team is monitoring a batch prediction job on Vertex AI. Which two metrics should they monitor to ensure the job completes successfully without errors?

Select 2 answers

A.Data size of input

B.Prediction requests per second

C.Job failure rate

D.Model endpoint latency

E.Number of preempted workers

AnswersC, E

Failure rate directly indicates job success.

Why this answer

Option C is correct because the job failure rate directly indicates whether the batch prediction job is completing successfully or encountering errors. Monitoring this metric allows the team to detect and respond to failures in the prediction pipeline, ensuring the job finishes without errors.

Exam trap

Google Cloud often tests the distinction between batch and online prediction metrics, and the trap here is that candidates mistakenly apply online serving metrics (like latency or requests per second) to batch jobs, or overlook worker preemption as a critical failure indicator in distributed batch processing.

Full explanation →

491

MCQeasy

An ML team is moving from a prototype Jupyter notebook to a production training pipeline. They want to ensure reproducibility. Which approach should they take?

A.Use interactive parameter tuning.

B.Use a container with fixed dependencies and record hyperparameters.

C.Export the notebook's output model directly.

D.Save the notebook as a .py file.

AnswerB

Captures environment and configuration for reproducibility.

Why this answer

Option C is correct because using a container with fixed dependencies and recording hyperparameters ensures that the training environment and configuration are captured, enabling exact reproduction. Option A is wrong because a .py file does not capture the full environment. Option B is wrong because exporting the notebook's output model directly lacks environment tracking.

Option D is wrong because interactive tuning is not reproducible.

Full explanation →

492

Multi-Selecthard

You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)

Select 3 answers

A.Use BigQuery scheduled queries to run the training script on a schedule.

B.Use Vertex AI Pipelines to define the ML pipeline as a Directed Acyclic Graph (DAG) of components.

C.Use AI Platform Notebooks to schedule the training job on a recurring basis.

D.Use Cloud Build and Cloud Functions to trigger the pipeline when new training data arrives in Cloud Storage.

E.Use Cloud Composer to orchestrate the pipeline steps, including data extraction, preprocessing, training, and deployment.

AnswersB, D, E

Vertex AI Pipelines is purpose-built for ML pipelines.

Why this answer

Vertex AI Pipelines (option B) is correct because it provides a managed, serverless orchestration service for building, testing, and deploying ML pipelines as Directed Acyclic Graphs (DAGs). This directly supports the requirement for automated, minimal-intervention weekly retraining by allowing you to define reusable components and schedule pipeline runs via Cloud Scheduler or event triggers, integrating natively with TensorFlow and Google Cloud services.

Exam trap

The trap here is confusing development tools (like Notebooks) or data-query services (like BigQuery scheduled queries) with production-grade orchestration services, leading candidates to select options that cannot handle multi-step pipeline dependencies or automated scheduling in a managed, scalable way.

Full explanation →

493

MCQhard

Refer to the exhibit. A user is trying to upload a Vertex AI pipeline definition. The error indicates an invalid dependency order. What should the user do to fix this?

A.Reorder the tasks in the YAML so that task1 is defined before task2.

B.Rename task1 to a name that comes alphabetically before task2.

C.Change the dependency of task2 to be independent of task1.

D.Remove the dependentTasks field from task2 and rely on implicit ordering.

AnswerA

YAML ordering determines execution order when dependencies are declared.

Why this answer

Option A is correct because Vertex AI pipeline definitions require that tasks be declared in the order they appear in the dependency graph. The YAML parser validates the `dependentTasks` field by checking that referenced tasks are already defined. Defining `task1` before `task2` ensures that when `task2` declares a dependency on `task1`, `task1` is already in scope, resolving the invalid dependency order error.

Exam trap

Google Cloud often tests the misconception that alphabetical naming or implicit ordering can resolve dependency declaration errors, when in fact the YAML parser strictly requires tasks to be defined in topological order.

How to eliminate wrong answers

Option B is wrong because renaming tasks alphabetically does not affect the order of definition in the YAML file; Vertex AI pipelines rely on the sequence of task declarations, not lexical ordering of names. Option C is wrong because removing the dependency between task2 and task1 would change the pipeline logic, potentially breaking the intended workflow, and the error is about declaration order, not about whether the dependency is valid. Option D is wrong because implicit ordering is not supported in Vertex AI pipelines; the `dependentTasks` field is required to explicitly define dependencies, and removing it would cause the pipeline to run tasks in an undefined order, likely leading to runtime failures.

Full explanation →

494

Multi-Selectmedium

A company wants to deploy a model for real-time inference with high availability across multiple Google Cloud regions. The model is small and stateless. Which two steps should they take? (Choose two.)

Select 2 answers

A.Deploy the model to Vertex AI Prediction endpoints in multiple regions and use a global external HTTP(S) load balancer to route traffic to the nearest region.

B.Use Cloud Run with multi-region deployment and a global HTTP(S) load balancer.

C.Use Cloud Functions with a global HTTP(S) load balancer.

D.Use a single Vertex AI Prediction endpoint with multiple replicas across zones in the same region.

E.Deploy the model to a Vertex AI Prediction endpoint in a single region and use a global external HTTP(S) load balancer.

AnswersA, B

Multi-region endpoints with global load balancer provide HA and low latency.

Why this answer

Options B and C are correct. B deploys the model to Vertex AI Prediction endpoints in multiple regions behind a global load balancer, providing regional failover. C uses Cloud Run with multi-region deployment and a global load balancer, which also offers multi-region HA.

Option A is insufficient as a single region does not survive a regional outage. Option D is wrong because Cloud Functions is region-specific and not designed for latency-sensitive inference across regions. Option E is wrong because a single region does not provide cross-region HA.

Full explanation →

495

MCQeasy

A team wants to ensure that only approved models are deployed to production. Which Vertex AI feature should they use?

A.Vertex AI Experiments.

B.Cloud DLP.

C.Vertex AI Pipelines.

D.Vertex AI Feature Store.

E.Vertex AI Model Registry with versioning and alias.

AnswerE

Model Registry provides version control and alias-based deployment gates.

Why this answer

Vertex AI Model Registry with versioning and alias (Option E) is the correct feature because it allows teams to manage model lifecycle, track approved versions, and assign aliases (e.g., 'champion' or 'production') to designate which model is approved for deployment. This ensures only vetted models are promoted to production, aligning with governance and compliance requirements.

Exam trap

Google Cloud often tests the distinction between model tracking (Experiments) and model governance (Registry), so the trap here is assuming that any 'management' feature (like Pipelines or Experiments) can enforce deployment approvals, when only the Registry with aliases provides explicit version control and approval semantics.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing ML training runs, not for managing model deployment approvals. Option B is wrong because Cloud DLP (Data Loss Prevention) is a service for inspecting and masking sensitive data, not for model governance or deployment control. Option C is wrong because Vertex AI Pipelines orchestrates ML workflows (e.g., training, evaluation) but does not inherently enforce approval gates for production deployment.

Option D is wrong because Vertex AI Feature Store is used for storing, serving, and sharing feature data, not for model versioning or deployment approval.

Full explanation →

496

MCQmedium

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

A.Store and serve features using Vertex AI Feature Store with point-in-time correctness

B.Implement feature engineering in Vertex AI Pipelines to ensure temporal ordering

C.Store all features in BigQuery and join on timestamp during training and serving

D.Use Vertex AI AutoML instead of custom training

AnswerA

Feature Store provides consistent feature values for each timestamp, preventing leakage.

Why this answer

Option A is correct because Vertex AI Feature Store with point-in-time correctness ensures that for each training example, only feature values that were known at the time of the prediction (i.e., before the label occurred) are used. This prevents future data from leaking into the training set, which is the most common cause of poor generalization when temporal ordering matters. The Feature Store automatically retrieves the latest feature value as of a specified timestamp, eliminating the need for manual joins and windowing logic.

Exam trap

Google Cloud often tests the misconception that simply using a pipeline or a data warehouse with timestamps is sufficient to prevent leakage, but the key is the automated enforcement of point-in-time correctness, which only a dedicated feature store with time-travel capabilities provides.

How to eliminate wrong answers

Option B is wrong because implementing feature engineering in Vertex AI Pipelines ensures reproducible workflows but does not inherently enforce temporal ordering or prevent data leakage; pipelines can still join future features if the data is not time-aware. Option C is wrong because storing all features in BigQuery and joining on timestamp during training and serving is a manual approach that is error-prone and does not guarantee point-in-time correctness; it requires careful windowing logic and can still leak future data if the join is not correctly scoped. Option D is wrong because using Vertex AI AutoML does not automatically solve data leakage; AutoML models are equally susceptible to leakage if the training data contains future information, and the user still needs to ensure temporal integrity of the input features.

Full explanation →

497

MCQmedium

A model deployed on Vertex AI Prediction repeatedly exits with code 137. What is the most likely cause?

A.The model has a disk I/O bottleneck.

B.The model is using too much CPU.

C.The container image is incompatible with the machine type.

D.The model is using more memory than allocated (4GB).

AnswerD

Memory limit reached, OOM killer terminates process.

Why this answer

Exit code 137 indicates that the container was killed by the Linux kernel's Out-Of-Memory (OOM) killer. In Vertex AI Prediction, each model deployment has a fixed memory allocation (default 4GB for custom containers). When the model's inference process exceeds this limit, the OOM killer terminates the container, resulting in exit code 137.

This is the most direct and common cause for this specific exit code in Vertex AI.

Exam trap

Google Cloud often tests the distinction between exit codes: candidates may confuse exit code 137 (OOM kill) with exit code 1 (generic error) or exit code 139 (segmentation fault), leading them to incorrectly attribute the issue to CPU or disk problems.

How to eliminate wrong answers

Option A is wrong because disk I/O bottlenecks typically cause slow performance or timeouts, not exit code 137 (SIGKILL from OOM). Option B is wrong because high CPU usage may cause throttling or latency, but does not trigger the OOM killer; exit code 137 is specifically memory-related. Option C is wrong because an incompatible container image would result in a different error, such as a crash loop with exit code 1 or 139 (segfault), not the OOM-specific exit code 137.

Full explanation →

498

MCQmedium

A data scientist trained a model on a single GPU but needs to train on multiple GPUs for a larger dataset. They observe that training time does not decrease linearly with additional GPUs. Which common issue is most likely?

A.Overfitting.

B.Model architecture too simple.

C.Learning rate too high.

D.Data pipeline bottleneck.

AnswerD

I/O or preprocessing bottleneck limits GPU utilization.

Why this answer

Option A is correct because a data pipeline bottleneck can starve GPUs, preventing linear speedup. Option B is wrong because overfitting relates to model performance, not training speed. Option C is wrong because learning rate affects convergence, not parallelism efficiency.

Option D is wrong because model architecture size does not directly cause non-linear speedup.

Full explanation →

499

MCQmedium

A team is using Cloud Composer to orchestrate ML workflows. They have a DAG that triggers a Vertex AI Training job, then a prediction deployment. The deployment step occasionally fails due to quota limits. What is the best way to handle this?

A.Increase the quota manually

B.Use Vertex AI Pipelines instead of Cloud Composer

C.Create a custom sensor to wait for quota to be available

D.Catch the exception in the DAG and send an alert

E.Implement exponential backoff retry in the DAG task

AnswerE

Retries with backoff handle transient failures.

Why this answer

Option E is correct because Cloud Composer (Apache Airflow) provides built-in retry mechanisms via task parameters like `retries` and `retry_delay`. Implementing exponential backoff in the DAG task is the best practice for handling transient quota errors, as it automatically retries the deployment step with increasing delays, reducing load on the quota system and increasing the chance of success without manual intervention. This approach aligns with Airflow's native error-handling capabilities and avoids unnecessary complexity or resource waste.

Exam trap

The trap here is that candidates often confuse manual quota increases or switching tools as the primary solution, when the exam expects knowledge of Airflow's native retry mechanisms and the principle of handling transient errors automatically within the orchestration layer.

How to eliminate wrong answers

Option A is wrong because manually increasing quota is a reactive, non-scalable solution that does not address transient quota limits and may incur additional costs or require approval processes. Option B is wrong because switching to Vertex AI Pipelines does not inherently solve quota limit issues; it changes the orchestration tool but still relies on the same underlying Vertex AI services and quota constraints. Option C is wrong because creating a custom sensor to wait for quota availability is overly complex, introduces polling overhead, and does not leverage Airflow's built-in retry mechanisms; sensors are better suited for waiting on external conditions like file arrival, not for handling transient API errors.

Option D is wrong because catching the exception and sending an alert only notifies the team of failure without automatically recovering the task, leading to manual intervention and potential delays; it does not handle the transient nature of quota errors.

Full explanation →

500

MCQmedium

A team deploys a model using Vertex AI Endpoint with automatic scaling. They observe that during traffic spikes, new instances take a long time to become ready, causing high latency for some requests. What should they configure to reduce this startup time?

A.Increase the max replicas

B.Use a custom container with a smaller footprint

C.Enable predictive autoscaling

D.Set a higher target CPU utilization

AnswerB

Smaller containers pull and initialize faster, reducing the time to become ready.

Why this answer

Option D is correct because using a custom container with a smaller footprint (e.g., smaller base image, fewer dependencies) reduces the time to pull and initialize the container. Option A increases max replicas but does not speed up startup. Option B may help trigger scaling earlier but startup time remains.

Option C is not a standard setting.

Full explanation →

501

MCQeasy

A retail company wants to predict customer churn using their transaction history and customer demographics. They have limited ML expertise and want to use a managed service on Google Cloud. Which service should they use?

A.AI Platform Notebooks

B.Vertex AI AutoML (Tables)

C.Cloud TPU

D.BigQuery ML

AnswerB

AutoML Tables provides end-to-end automated model building for tabular data, ideal for limited ML expertise.

Why this answer

Vertex AI AutoML (Tables) is the correct choice because it is a managed service specifically designed for tabular data, requiring no ML expertise. It automates model training, hyperparameter tuning, and deployment for classification tasks like churn prediction, directly handling transaction history and demographic features.

Exam trap

The trap here is that candidates may confuse BigQuery ML as a fully managed no-code solution, but it still requires SQL proficiency and manual model selection, whereas Vertex AI AutoML is the true zero-code managed service for tabular data.

How to eliminate wrong answers

Option A is wrong because AI Platform Notebooks provides a Jupyter-based development environment for custom ML coding, not a managed no-code solution, and requires ML expertise to build and train models. Option C is wrong because Cloud TPU is a hardware accelerator for training large deep learning models (e.g., NLP or vision), not a managed service for tabular churn prediction, and is overkill for this use case. Option D is wrong because BigQuery ML enables SQL-based model creation directly in BigQuery, but it requires some ML knowledge to write queries and tune models, and is less automated than AutoML for users with limited ML expertise.

Full explanation →

502

Multi-Selectmedium

A financial services company has deployed a credit risk ML model on Vertex AI. They want to monitor the model for fairness across demographic groups to ensure no biased outcomes. Which TWO actions should they take as best practices? (Choose TWO.)

Select 2 answers

A.Eliminate all features that are correlated with protected attributes from the model input to ensure fairness.

B.Use Vertex Explainable AI to understand feature attributions and compare their distributions across demographic groups.

C.Periodically compare the model's performance metrics (e.g., AUC) on the overall population versus the holdout test set.

D.Store all model predictions in BigQuery but do not capture ground truth labels to avoid privacy issues.

E.Set up alerts on the Vertex AI Model Monitoring fairness metrics, such as equal opportunity difference, and configure a slack channel for notifications.

AnswersB, E

Feature attribution analysis helps identify if the model relies disproportionately on sensitive attributes.

Why this answer

Option B is correct because Vertex Explainable AI provides feature attribution scores that can be compared across demographic groups to detect if the model relies on sensitive attributes or proxies. This enables fairness auditing by revealing whether the model's decision logic differs systematically for protected groups, which is a best practice for monitoring bias.

Exam trap

Google Cloud often tests the misconception that removing protected attributes or correlated features is sufficient for fairness, when in reality proxy features and complex interactions can still cause bias, making monitoring with explainability and fairness metrics essential.

Full explanation →

503

MCQmedium

A company uses BigQuery to store feature data for ML training. A data engineer notices that a Vertex AI Training job is failing with 'Access Denied' errors when reading from a BigQuery table. The training job uses a custom service account that has been granted the 'bigquery.dataViewer' role on the dataset. What is the most likely cause of the failure?

A.The service account is not in the same project as the BigQuery dataset.

B.The BigQuery table is partitioned and requires row-level access.

C.The service account lacks the 'bigquery.jobs.create' permission in the project.

D.The training job does not have the required network access to BigQuery.

AnswerC

Reading from BigQuery via Vertex AI Training requires the ability to submit a query job, which requires 'bigquery.jobs.create'.

Why this answer

The 'bigquery.dataViewer' role grants permissions to read BigQuery data (e.g., bigquery.tables.getData), but it does not include the 'bigquery.jobs.create' permission. When a Vertex AI training job reads from BigQuery, it must first create a BigQuery job (a query job) to retrieve the data. Without 'bigquery.jobs.create' at the project level, the service account cannot initiate the read operation, resulting in an 'Access Denied' error even though it has data-level access.

Exam trap

The trap here is that candidates often assume 'bigquery.dataViewer' is sufficient for all read operations, overlooking the requirement for 'bigquery.jobs.create' to initiate the query job that actually reads the data.

How to eliminate wrong answers

Option A is wrong because the service account does not need to be in the same project as the BigQuery dataset; cross-project access is supported as long as IAM permissions are granted at the dataset or table level. Option B is wrong because partitioned tables do not require row-level access by default; row-level access is controlled via BigQuery row-level security policies, which are not automatically required for partitioned tables. Option D is wrong because Vertex AI training jobs run within Google Cloud's internal network and have built-in access to BigQuery via the Cloud API; network access is not a common cause of 'Access Denied' errors for BigQuery reads.

Full explanation →

504

Multi-Selecthard

Which TWO statements are true about canary deployments for Vertex AI endpoints?

Select 2 answers

A.Canary deployments are only supported for custom containers, not prebuilt frameworks.

B.You can roll back a canary by resetting traffic to 0% for the new version.

C.You can use traffic splitting to gradually shift 1-100% of traffic to a new version.

D.Canary deployments require the use of Vertex AI Model Registry.

E.Once a canary receives 50% traffic, you cannot increase it further.

AnswersB, C

Traffic can be shifted back to old version easily.

Why this answer

Traffic splitting is supported for gradual rollout; you cannot increase split after max traffic limit (though you can adjust). Canary can help test before full rollout; monitoring metrics can be used for automated rollback.

Full explanation →

505

MCQmedium

A financial services company deploys a fraud detection model on Vertex AI. The model must make predictions in under 100ms. After deployment, latency spikes to 300ms during peak hours. The model is a large ensemble with 500MB size. Which action is most likely to reduce latency?

A.Optimize the model using TensorFlow Lite and convert to a smaller format.

B.Switch to batch prediction to process requests asynchronously.

C.Reduce the machine type to a smaller instance.

D.Increase the number of replicas on the endpoint.

AnswerA

Reduces model size and inference time.

Why this answer

The primary cause of latency is the large model size (500MB) combined with real-time inference constraints. Optimizing the model with TensorFlow Lite reduces the model size and computational overhead, directly decreasing inference time. This addresses the root cause—model complexity—rather than scaling infrastructure around an inefficient model.

Exam trap

The trap here is that candidates often confuse scaling (replicas or instance size) with optimization, failing to recognize that model size and inference efficiency are the primary drivers of latency in real-time serving.

How to eliminate wrong answers

Option B is wrong because batch prediction processes requests asynchronously, which does not meet the sub-100ms real-time requirement; it is designed for offline, high-throughput scenarios, not low-latency serving. Option C is wrong because reducing the machine type to a smaller instance would decrease available CPU/memory, likely increasing latency further due to resource contention. Option D is wrong because increasing the number of replicas improves throughput and availability but does not reduce per-request latency; it may even add network overhead from load balancing.

Full explanation →

506

MCQhard

You are the ML engineer for a financial services company. You have deployed a fraud detection model on Vertex AI Endpoints using a custom container. The model is a gradient boosting model trained on transactional data. Over the past week, the model's precision has dropped from 95% to 80%, while recall has remained stable. The input data volume and distribution have not changed significantly. The model is served on a single endpoint with autoscaling enabled (min replicas=2, max replicas=10). You notice that the average CPU utilization of the serving containers has increased from 40% to 90%, and the p99 latency has increased from 50ms to 200ms. The model is retrained weekly using the latest data, and the last retraining was 3 days ago. The logs show no errors, and the model version is unchanged. Given these symptoms, what is the most likely cause of the precision drop?

A.The autoscaling policy is not scaling up fast enough, causing increased latency and prediction errors.

B.The model is overfitting to recent transaction patterns due to weekly retraining.

C.A recent change in the preprocessing code in the container transformed features differently than what the model expects, causing incorrect predictions.

D.The model was replaced with a different version without updating the endpoint.

AnswerC

Feature transformation mismatch can cause precision drop without affecting recall.

Why this answer

Option C is correct because the precision drop without a change in input distribution or recall strongly indicates a systematic error in predictions, not a data shift. A preprocessing code change in the custom container would cause the model to receive features transformed differently than during training, leading to incorrect probability estimates. The increased CPU utilization and latency are consistent with the container performing additional or different preprocessing steps, not with autoscaling issues or model version changes.

Exam trap

The trap here is that candidates often attribute latency increases and precision drops to autoscaling or model drift, but the key clue is that recall remains stable, which points to a systematic prediction error (preprocessing mismatch) rather than a data distribution shift or infrastructure scaling problem.

How to eliminate wrong answers

Option A is wrong because autoscaling delays cause increased latency and potential timeouts, but they do not directly cause a precision drop; precision depends on the correctness of predictions, not on response time. Option B is wrong because overfitting to recent patterns would typically cause a drop in recall as well, not just precision, and the input distribution has not changed significantly. Option D is wrong because the logs show no errors and the model version is unchanged, so the endpoint is still serving the same model; a version replacement would require an explicit update and would likely trigger a deployment event.

Full explanation →

Google Professional Machine Learning Engineer (PMLE) — Questions 451–506