Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 526–600

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 8 of 14

526

MCQeasy

A company wants to classify support ticket text into categories. They have labeled historical tickets. Which Google Cloud service allows them to train a custom classification model with no code?

A.Vertex AI Matching Engine

B.AutoML Natural Language

C.Cloud Natural Language API

D.Document AI

AnswerB

Correct: No-code custom text classification.

Why this answer

AutoML Natural Language (now part of Vertex AI) is the correct service because it enables users to train custom text classification models using labeled data without writing any code. It provides a no-code interface for uploading datasets, training models, and evaluating performance, making it ideal for classifying support ticket text into custom categories.

Exam trap

The trap here is that candidates confuse the pre-trained Cloud Natural Language API (which requires no training but cannot be customized) with AutoML Natural Language (which requires labeled data but allows custom categories), leading them to select Option C incorrectly.

How to eliminate wrong answers

Option A is wrong because Vertex AI Matching Engine is designed for vector similarity search and embeddings, not for training custom classification models with labeled text data. Option C is wrong because Cloud Natural Language API is a pre-trained API that offers sentiment analysis, entity extraction, and syntax analysis, but it cannot be trained on custom labeled data for custom categories. Option D is wrong because Document AI is specialized for document processing (e.g., OCR, form parsing, invoice extraction) and is not intended for general text classification from labeled ticket data.

Full explanation →

527

MCQhard

Refer to the exhibit. An alert policy is configured to trigger when prediction latency exceeds 500 ms for 5 consecutive minutes. The team is experiencing many false positive alerts during brief latency spikes. Which adjustment would most effectively reduce false positives while still detecting prolonged latency issues?

A.Change the comparison to less than

B.Add a condition that CPU utilization is also high

C.Increase the duration to 30 minutes

D.Increase the threshold to 1000 ms

AnswerC

A longer duration means the condition must persist for 30 minutes, filtering out brief spikes while still catching sustained high latency.

Why this answer

Increasing the duration from 5 to 30 minutes (Option C) directly addresses the problem of false positives from brief latency spikes by requiring the latency to exceed 500 ms for a longer continuous period before triggering an alert. This ensures that only sustained, prolonged latency issues—not transient spikes—activate the policy, aligning with the goal of detecting genuine degradation while ignoring noise.

Exam trap

Google Cloud often tests the distinction between threshold and duration adjustments, trapping candidates who think raising the threshold (Option D) is the only way to reduce false positives, when in fact increasing the evaluation window is more precise for filtering out transient spikes without compromising detection of sustained issues.

How to eliminate wrong answers

Option A is wrong because changing the comparison to 'less than' would invert the logic, triggering alerts when latency is below 500 ms, which is the opposite of detecting high latency and would generate false positives for normal or low-latency conditions. Option B is wrong because adding a condition that CPU utilization is also high introduces an unnecessary dependency that may miss prolonged latency issues caused by other factors (e.g., network bottlenecks, memory pressure, or I/O wait), and it does not address the core problem of brief latency spikes. Option D is wrong because increasing the threshold to 1000 ms would allow sustained latency between 500 ms and 1000 ms to go undetected, failing to capture prolonged issues that still violate the original 500 ms requirement, and it does not filter out brief spikes.

Full explanation →

528

MCQeasy

A company needs to serve a model for real-time predictions with a strict latency SLA of 100ms at the 99th percentile. The model is lightweight and traffic patterns are highly variable with occasional spikes. Which deployment strategy best meets the SLA while controlling cost?

A.Deploy the model as a Cloud Run service with autoscaling to zero.

B.Deploy to Vertex AI Endpoint with manual scaling and a fixed number of replicas.

C.Use Vertex AI Batch Prediction.

D.Deploy to Vertex AI Endpoint with min_replica_count=3 and autoscaling enabled.

AnswerD

Min replicas provide baseline capacity to absorb spikes, and autoscaling adds replicas as needed.

Why this answer

Option D is correct because setting a minimum number of replicas ensures baseline capacity to handle initial spikes without cold start delays, while autoscaling handles larger spikes. Option A is wrong because batch prediction is not real-time. Option B is wrong because no scaling may cause over-provisioning or under-provisioning.

Option C is wrong because Cloud Run with no accelerator may not meet latency SLA for ML models.

Full explanation →

529

MCQmedium

Your team is using Vertex AI Feature Store for online predictions. You notice that feature values for some entities are missing in production, leading to failed predictions. Upon investigation, you find that the ingestion pipeline has been failing intermittently. What is the best immediate course of action to prevent prediction failures?

A.Configure default values for missing features in the feature store so that the model can fall back on them.

B.Set up monitoring alerts on the ingestion pipeline to get notified of failures.

C.Change the prediction request to ignore missing features.

D.Manually re-ingest all missing features by running the ingestion pipeline again.

AnswerA

Ensures predictions can be made even when features are not available.

Why this answer

Option D is correct because using default values in the serving layer ensures predictions can still be made when features are missing. Option A is wrong because recreating features takes time and does not fix the ingestion issue. Option B is wrong because it does not address the missing values.

Option C is wrong because monitoring alone does not prevent failures.

Full explanation →

530

MCQhard

A company wants to use BigQuery ML to train a DNN_CLASSIFIER model on a dataset with 100 million rows. They are concerned about training time and cost. Which approach can help optimize training performance while staying within BigQuery ML?

A.Use OPTIONS('MAX_ITERATIONS' = 10) to limit training iterations

B.Use Vertex AI AutoML Tables instead of BigQuery ML

C.Train on a 10% random sample of the data to reduce cost

D.BigQuery ML automatically optimizes training; no additional configuration needed

AnswerD

BigQuery ML handles optimization internally, adjusting training parameters for efficiency.

Why this answer

BigQuery ML automatically selects the optimal training configuration. However, using OPTIONS like 'MAX_ITERATIONS' with a smaller number can reduce training time, but setting it too low may harm accuracy. Using a filtered subset for training is not standard.

Using Vertex AI AutoML Tables would require data export. Early stopping is not directly configurable in BigQuery ML DNN.

Full explanation →

531

MCQhard

A team runs a Vertex AI pipeline daily. They notice that a component that downloads a file from a public URL always executes even when the URL and parameters haven't changed. They want to avoid unnecessary re-execution and reduce costs. What should they do?

A.Ensure the component is deterministic and that caching is not disabled for that component.

B.Enable pipeline caching by setting 'enable_caching=True' on the pipeline decorator.

C.Use a pre-built Google Cloud Pipeline Component for file download.

D.Use the 'dsl.CachingOptions' to set a custom cache key for the component.

AnswerA

Caching may be disabled per component, or the component may produce non-deterministic outputs (e.g., timestamp). Enabling caching and making the component deterministic will allow reuse.

Why this answer

Vertex AI Pipelines automatically caches component outputs based on a cache key derived from inputs, component code, and image digest. If caching is not working, it might be disabled or the component might produce non-deterministic outputs. Re-enabling caching (it's on by default) or ensuring deterministic behavior will help.

Full explanation →

532

Multi-Selectmedium

A company uses Vertex AI for AutoML training. Which THREE are best practices for managing model versions?

Select 3 answers

A.Deploy each model version to a separate endpoint

B.Use Vertex AI Model Registry to version models

C.Use evaluation metrics to compare versions

D.Use labels to tag models for tracking

E.Automatically delete old versions after 30 days

AnswersB, C, D

Correct: Centralized model versioning.

Why this answer

Vertex AI Model Registry is the central repository for managing and versioning models, allowing you to track iterations, compare performance, and control deployments. It provides a structured way to organize models, roll back to previous versions if needed, and maintain lineage for compliance and reproducibility.

Exam trap

The trap here is that candidates may think deploying each version to a separate endpoint is necessary for isolation, but Vertex AI's traffic splitting on a single endpoint is the correct and cost-effective approach for managing multiple model versions.

Full explanation →

533

MCQhard

A financial services company uses Document AI to process loan applications. They want to ensure that any documents the model cannot process with high confidence are reviewed by a human before finalizing the decision. Which Document AI feature should they enable?

A.AutoML Tables model retraining

B.Cloud DLP for data inspection

C.Increase the number of processors

D.Human-in-the-Loop (HITL)

AnswerD

HITL enables human review for low-confidence documents, ensuring accuracy.

Why this answer

Human-in-the-Loop (HITL) allows documents with low confidence scores to be routed to human reviewers. This is a built-in feature of Document AI. AutoML Tables is not directly related.

Cloud DLP is for data loss prevention.

Full explanation →

534

Multi-Selectmedium

A machine learning engineer is monitoring a deployed churn prediction model that has shown a gradual decline in accuracy over the past month. The engineer wants to diagnose the root cause of the performance degradation. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers

A.Increase the model's learning rate and fine-tune it on the latest data.

B.Immediately retrain the model using all available historical data to improve accuracy.

C.Deploy a second model in parallel to compare predictions.

D.Use Vertex AI Model Monitoring to detect data drift by comparing the distribution of recent input features against the training data distribution.

E.Monitor the model's prediction accuracy by comparing recent predictions against newly collected ground truth labels.

AnswersD, E

Detecting data drift helps identify if the input distribution has changed, which often causes prediction drift.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect data drift by comparing the distribution of recent input features against the training data distribution. This allows the engineer to identify if the gradual decline in accuracy is caused by changes in the input data, which is a common root cause for model performance degradation over time.

Exam trap

The trap here is that candidates often confuse reactive retraining (Option B) with diagnostic monitoring, failing to recognize that the first step in troubleshooting performance degradation is to identify the root cause through drift detection and ground truth comparison, not to immediately modify or retrain the model.

Full explanation →

535

MCQhard

A data science team uses Vertex AI Pipelines to orchestrate ML training. They notice that some pipeline runs are failing because of inconsistent data schemas. They want to enforce schema validation as a gate before the training step executes. Which approach should they implement?

A.Use Cloud Dataflow to validate schema during data ingestion before the pipeline starts.

B.Use BigQuery schema enforcement when importing data.

C.Add a pipeline component that runs schema validation using the TensorFlow Data Validation library.

D.Use TFX ExampleGen with schema_gen to automatically generate and enforce schemas.

AnswerC

A custom component using TFDV can validate schema inside the pipeline and fail early if mismatched.

Why this answer

Option C is correct because the TensorFlow Data Validation (TFDV) library is specifically designed for ML pipeline schema validation. By adding a custom pipeline component that uses TFDV, the team can validate incoming data schemas against a predefined schema directly within the Vertex AI Pipelines orchestration, acting as a gate before the training step executes. This approach integrates seamlessly with the pipeline's component-based architecture and provides detailed anomaly reports.

Exam trap

Google Cloud often tests the distinction between tools that are part of the TFX ecosystem (like ExampleGen) versus standalone libraries (like TFDV) that can be used independently in custom pipeline components, leading candidates to choose D because they associate schema validation with TFX without realizing the integration requirements.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a batch/stream processing service for data transformation, not a schema validation tool; validating schema during ingestion outside the pipeline does not enforce the gate within the pipeline orchestration. Option B is wrong because BigQuery schema enforcement only validates data at the table level during import, but it is not a pipeline component that can be placed as a gate before a training step in Vertex AI Pipelines. Option D is wrong because TFX ExampleGen with schema_gen is part of the TFX framework, which is not directly compatible with Vertex AI Pipelines' custom component model; it would require significant adaptation and does not provide a simple gate component within the pipeline.

Full explanation →

536

MCQmedium

A data scientist wants to create a Vertex AI pipeline component that uses a custom container image stored in Artifact Registry. The component should accept a dataset artifact as input and output a model artifact. Which component type should they use?

A.A Lightweight Python component without base_image.

B.A Python function component with a custom base_image.

C.A container component defined with the ContainerSpec class.

D.A pre-built component from GCPC.

AnswerC

Correct: Container components use ContainerSpec to specify the image, command, and arguments.

Why this answer

Option C is correct because a container component defined with the `ContainerSpec` class is the only Vertex AI component type that allows you to specify a custom container image from Artifact Registry. This component type directly wraps a Docker container, enabling you to define inputs (e.g., a dataset artifact) and outputs (e.g., a model artifact) via the `ContainerSpec` interface, which maps to the container's command-line arguments and environment variables. Lightweight Python components and Python function components cannot use a custom container image without a base image, and pre-built components from GCPC are fixed and do not support custom containers.

Exam trap

The trap here is that candidates confuse a Python function component with a custom `base_image` (Option B) as equivalent to running a custom container, but the `base_image` is used to build a new container from a Python function, not to directly execute an existing container image from Artifact Registry.

How to eliminate wrong answers

Option A is wrong because a Lightweight Python component without `base_image` uses the default Vertex AI Python image and cannot reference a custom container image from Artifact Registry; it is designed for inline Python code, not custom containers. Option B is wrong because a Python function component with a custom `base_image` still runs as a Python function within a container built from that base image, but it does not directly wrap an arbitrary custom container image from Artifact Registry; the base image is used to build a new container, not to run an existing one. Option D is wrong because a pre-built component from GCPC (Google Cloud Pipeline Components) is a fixed, reusable component that cannot be customized with a user-provided container image; it is intended for common operations like AI Platform training or data processing, not for running a custom container.

Full explanation →

537

Multi-Selectmedium

An ML engineer needs to deploy a model to Vertex AI for online predictions and enable autoscaling to zero when not in use. Which THREE conditions must be met? (Choose 3)

Select 3 answers

A.Enable serverless serving by selecting the appropriate serving mode.

B.Use a GPU-enabled machine type for faster scaling.

C.Deploy the model using a custom container with WebSockets support.

D.Set `min_replica_count=0` in the endpoint deployment config.

E.Set `max_replica_count` to a value > 0.

AnswersA, D, E

Vertex AI offers serverless mode for CPU models that supports scale-to-zero.

Why this answer

Option A is correct because Vertex AI offers a serverless serving mode that automatically scales resources to zero when no requests are being processed. By selecting this mode, the ML engineer enables the endpoint to scale down completely during idle periods, eliminating costs for unused infrastructure. This is distinct from standard serving, which maintains a minimum number of replicas.

Exam trap

Cisco often tests the misconception that setting `min_replica_count=0` alone is sufficient, but candidates forget that `max_replica_count` must also be set to a positive value to allow scaling up from zero.

Full explanation →

538

MCQhard

You need to perform a large-scale feature computation on streaming data from Pub/Sub, transforming raw events into features, and writing results to Vertex AI Feature Store for online serving. Which Google Cloud architecture is most appropriate?

A.Use Dataproc with Spark Streaming to read from Pub/Sub and write to Feature Store

B.Use Cloud Functions triggered by Pub/Sub to compute features and update Feature Store

C.Use Dataflow streaming pipeline with Apache Beam to read from Pub/Sub, compute features, and write to Feature Store

D.Use Cloud Run to consume Pub/Sub messages and update Feature Store via a service

AnswerC

Dataflow streaming is ideal for scalable, low-latency stream processing with exactly-once semantics.

Why this answer

Dataflow with streaming (Apache Beam) can read from Pub/Sub, transform data, and write to Feature Store via the online serving API. Cloud Functions is not suitable for complex transforms. Dataproc Streaming is possible but Dataflow is more natural.

Cloud Run is for request-response.

Full explanation →

539

MCQmedium

A company needs to serve a high-throughput prediction service with strict latency requirements. They want to minimize cold starts and ensure consistent performance. Which endpoint configuration is most appropriate?

A.Set min_replicas to an estimated baseline and max_replicas to a higher number

B.Set min_replicas and max_replicas equal to a fixed number

C.Set min_replicas to 0 and max_replicas to a high number

D.Do not set min_replicas; let Vertex AI automatically determine

AnswerA

This ensures always-on capacity for baseline traffic and room to scale.

Why this answer

Setting min_replicas to an estimated baseline ensures that a minimum number of instances are always running, eliminating cold starts for baseline traffic. Setting max_replicas to a higher number allows the service to scale up to handle traffic spikes while maintaining consistent performance. This configuration balances cost and latency by avoiding the overhead of scaling from zero while still accommodating bursts.

Exam trap

Cisco often tests the misconception that setting min_replicas to 0 is cost-effective, but the trap here is that it ignores the strict latency requirement and the reality of cold start delays in model serving.

How to eliminate wrong answers

Option B is wrong because setting min_replicas and max_replicas equal to a fixed number prevents any autoscaling, leading to either over-provisioning (waste) or under-provisioning (latency spikes) under variable load. Option C is wrong because setting min_replicas to 0 means the service can scale down to zero, causing cold starts on every request when traffic resumes, which violates the strict latency requirement. Option D is wrong because not setting min_replicas and relying on Vertex AI's automatic determination may result in the service scaling to zero or having unpredictable baseline capacity, introducing cold starts and inconsistent performance.

Full explanation →

540

MCQhard

You manage a multi-tenant serving system on Vertex AI Prediction where multiple models are deployed in a single endpoint using model versioning. One particular model version (v2) is consuming excessive resources, causing latency spikes for other versions. You need to isolate this model to prevent interference. The models are all in TensorFlow SavedModel format. What is the best approach?

A.Shard the models across multiple replicas using a custom routing logic in the container.

B.Set resource limits on the container using Kubernetes resource requests/limits, but Vertex AI Prediction does not support that.

C.Use Vertex AI Model Registry to deploy v2 to a dedicated endpoint and update the model alias.

D.Create a separate endpoint for v2 and redirect traffic using a load balancer.

AnswerC

Dedicated endpoint ensures resource isolation.

Why this answer

Option B is correct because creating a separate endpoint for v2 provides full resource isolation. Option A is similar but less direct (load balancer still distributes to same endpoint). Option C is not possible in Vertex AI Prediction.

Option D is complex and error-prone.

Full explanation →

541

MCQmedium

You are deploying a new version of a model to a Vertex AI endpoint that already has a champion model serving 100% of traffic. You want to gradually shift traffic to the new version while monitoring for errors. Which approach should you use?

A.Use Cloud Load Balancing with weighted backend services pointing to different endpoints.

B.Deploy the challenger to the same endpoint with initial traffic split, e.g., champion 90%, challenger 10%, and gradually adjust.

C.Delete the champion model and redeploy with the challenger as the new version.

D.Create a new endpoint for the challenger and use a load balancer to split traffic.

AnswerB

This is the correct method for A/B testing with traffic splitting in Vertex AI.

Why this answer

Vertex AI endpoints support traffic splitting between model versions deployed to the same endpoint. By deploying the challenger to the same endpoint and setting an initial split (e.g., champion 90%, challenger 10%), you can gradually shift traffic while monitoring for errors. This approach uses the endpoint's built-in traffic management, avoiding the complexity and latency of external load balancers.

Exam trap

Cisco often tests the misconception that external load balancers are required for traffic splitting, when in fact Vertex AI endpoints provide native traffic management that is simpler and more appropriate for model versioning.

How to eliminate wrong answers

Option A is wrong because Cloud Load Balancing operates at the network layer and cannot directly split traffic between model versions within a single Vertex AI endpoint; it would require separate endpoints and adds unnecessary overhead. Option C is wrong because deleting the champion model removes the ability to roll back or compare performance, violating the principle of gradual, safe deployment. Option D is wrong because creating a new endpoint for the challenger and using a load balancer to split traffic bypasses Vertex AI's native traffic splitting, which is simpler, more reliable, and designed for this exact use case.

Full explanation →

542

MCQhard

A company needs to maintain an audit trail of model changes for compliance. Multiple teams will be updating models. What is the best approach to track who created, modified, or deployed each model version?

A.Enable Cloud Storage audit logs and require all model files to be stored in a bucket

B.Use Cloud Logging to collect logs from all services and search for model names

C.Use Vertex AI Experiments and Metadata to track model lineage and audit logs

D.Ask team members to maintain a shared spreadsheet of changes

AnswerC

Vertex AI provides built-in audit capabilities with user attribution and metadata.

Why this answer

Option A is correct because Vertex AI automatically logs metadata (including user identity) via Cloud Audit Logs and ML Metadata. Option B is wrong because Cloud Storage logs only show object-level access, not model-specific actions. Option C is wrong because manual logging is error-prone.

Option D is wrong because Cloud Logging alone does not correlate events to model versions.

Full explanation →

543

MCQmedium

An application serving predictions from a Vertex AI endpoint receives many identical requests within a short time window. The team notices redundant computation and wants to cache responses to reduce latency and cost. What is the recommended solution?

A.Deploy the model on a larger machine type to handle duplicate requests faster.

B.Enable Vertex AI endpoint caching by setting the `enable_cache` flag.

C.Implement a cache layer using Cloud Memorystore for Redis, hashing prediction requests.

D.Use Cloud CDN in front of the endpoint.

AnswerC

Correct: Cloud Memorystore provides low-latency caching for identical requests.

Why this answer

Option C is correct because Vertex AI does not provide built-in request caching; instead, the recommended pattern is to implement an external cache like Cloud Memorystore for Redis. By hashing the prediction request payload and using it as a cache key, identical requests within the short time window can be served from Redis, eliminating redundant model inference and reducing both latency and cost.

Exam trap

The trap here is that candidates assume Vertex AI has a native caching feature (like `enable_cache`) because other Google Cloud services (e.g., Cloud CDN, Cloud Load Balancing) offer caching, but Vertex AI endpoints require an external cache layer like Memorystore for Redis.

How to eliminate wrong answers

Option A is wrong because deploying on a larger machine type increases throughput but does not eliminate redundant computation for identical requests; it still performs the same inference multiple times, wasting resources. Option B is wrong because Vertex AI endpoints do not support an `enable_cache` flag; this is a fictitious feature, and Vertex AI has no built-in request caching mechanism. Option D is wrong because Cloud CDN caches static content at the edge based on HTTP cache headers, but prediction requests are typically POST with dynamic payloads that are not cacheable by CDN, and CDN cannot inspect or hash request bodies for deduplication.

Full explanation →

544

MCQmedium

A company needs to forecast product demand for the next 12 months using historical sales data. They want to use BigQuery ML with minimal coding. Which model type is most suitable?

A.K_MEANS

B.MATRIX_FACTORIZATION

C.ARIMA_PLUS

D.LINEAR_REG

AnswerC

Why this answer

ARIMA_PLUS is designed for time-series forecasting and automatically handles seasonality, trends, and holidays. K_MEANS is for clustering, MATRIX_FACTORIZATION for recommendations, and LINEAR_REG for simple regression.

Full explanation →

545

MCQmedium

A company deploys a custom ML model on Vertex AI to predict customer churn. The model retrains weekly, and predictions are served via a Vertex AI endpoint. After a recent retraining, the monitoring dashboard shows a sudden increase in prediction requests but a decrease in predicted churn probabilities. The model's accuracy on the validation set remains stable. What is the most likely cause of the observed behavior?

A.A training-serving skew exists between the training pipeline and the serving endpoint.

B.Concept drift has occurred, changing the relationship between features and churn.

C.The incoming data distribution has changed, e.g., due to a new marketing campaign attracting different customers.

D.Data leakage during training caused the model to overfit to historical patterns.

AnswerC

This is covariate shift; the model sees inputs it wasn't trained on, leading to lower confidence predictions.

Why this answer

Option C is correct because a sudden increase in prediction requests alongside a decrease in predicted churn probabilities, while validation accuracy remains stable, indicates a shift in the incoming data distribution (covariate shift). This is typical when a new marketing campaign attracts a different customer segment that inherently has lower churn risk. The model itself hasn't degraded; it's simply seeing a different population than it was trained on, which changes the base rate of churn in the live traffic.

Exam trap

Google Cloud often tests the distinction between covariate shift (data distribution change) and concept drift (relationship change), trapping candidates who assume any change in predictions must be due to model degradation or data leakage.

How to eliminate wrong answers

Option A is wrong because training-serving skew refers to a mismatch in feature preprocessing or data format between training and serving, which would typically cause a drop in accuracy or anomalous predictions, not a stable validation accuracy with a shift in prediction distribution. Option B is wrong because concept drift would change the relationship between features and the target (churn), leading to a decline in model accuracy on the validation set, which is explicitly stated as stable. Option D is wrong because data leakage during training would cause overfitting to historical patterns, resulting in poor generalization and a drop in validation accuracy, not a stable accuracy with a shift in prediction probabilities.

Full explanation →

546

MCQmedium

A team wants to deploy two versions of a model (v1 and v2) on Vertex AI Endpoint to conduct an A/B test. They need to split traffic so that 10% of requests go to v2. Which configuration achieves this?

A.Deploy both versions on the same endpoint and use the `traffic_split` parameter to allocate 90% to v1 and 10% to v2.

B.Configure a global load balancer in front of two endpoints and set the weight.

C.Create two separate endpoints, one for each version, and have the client randomly select the endpoint.

D.Deploy v2 as a canary deployment and set the canary rollout to 10% in Cloud Deployment Manager.

AnswerA

Vertex AI endpoints support traffic splitting between deployed models.

Why this answer

Option C is correct because Vertex AI Endpoints support traffic splitting by allocating percentages to each model. Option A is wrong because canary deployment gradually rolls out, not fixed split. Option B is wrong because multiple endpoints cannot share traffic splitting.

Option D is wrong because routing at load balancer is not necessary.

Full explanation →

547

MCQeasy

A model deployed on Vertex AI Endpoints shows increasing prediction latency. What is the most scalable way to reduce latency?

A.Switch to a larger machine type

B.Enable autoscaling with min nodes increased

C.Use batch prediction instead

D.Deploy multiple model versions

AnswerB

Autoscaling adds nodes during high load, reducing latency.

Why this answer

Increasing the minimum number of nodes in autoscaling ensures that a baseline of compute capacity is always ready to handle requests, reducing cold-start latency. This is the most scalable approach because it allows the endpoint to dynamically scale up during traffic spikes while maintaining a floor of pre-warmed instances, directly addressing prediction latency without over-provisioning.

Exam trap

The trap here is that candidates confuse 'scalability' with 'raw performance' and choose a larger machine type (A), not realizing that horizontal scaling with pre-warmed nodes is more cost-effective and elastic for reducing latency under variable load.

How to eliminate wrong answers

Option A is wrong because switching to a larger machine type (e.g., more vCPUs or memory) can reduce per-request latency but is not scalable—it increases cost linearly and does not handle traffic bursts efficiently, as it still relies on a single node's capacity. Option C is wrong because batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce real-time prediction latency; it actually increases end-to-end time for individual requests. Option D is wrong because deploying multiple model versions does not inherently reduce latency; it adds routing overhead and does not address compute capacity or cold starts, and is intended for A/B testing or gradual rollouts, not performance optimization.

Full explanation →

548

Multi-Selectmedium

Which THREE practices improve collaboration when using Cloud Composer for ML pipelines?

Select 3 answers

A.Keep all pipeline logic in a single large DAG for simplicity.

B.Use a shared Cloud Storage bucket for intermediate artifacts with appropriate permissions.

C.Store DAGs in a version-controlled repository and use CI/CD to deploy them.

D.Embed service account keys directly in DAG code for authentication.

E.Use Airflow variables and connections to parameterize DAGs.

AnswersB, C, E

Facilitates handoff between pipeline steps and teams.

Why this answer

Option B is correct because Cloud Composer workflows often require sharing intermediate data (e.g., transformed datasets, model checkpoints) across multiple DAGs or team members. A shared Cloud Storage bucket with fine-grained IAM permissions enables secure, centralized artifact exchange without duplicating data or exposing it to unauthorized users. This practice avoids hard-coded paths and ensures that all pipeline stages can reliably access the same artifacts, which is critical for reproducibility and collaboration in ML pipelines.

Exam trap

Google Cloud often tests the misconception that a single monolithic DAG simplifies collaboration, when in fact it creates bottlenecks and merge conflicts; the trap is that candidates confuse 'simplicity' with 'ease of collaboration' without considering modularity and CI/CD practices.

Full explanation →

549

MCQhard

A recommendation system model is updated daily via a retraining pipeline. After each update, the online prediction latency increases significantly for about 30 minutes before returning to normal. What is the most likely cause and solution?

A.The Vertex AI endpoint autoscaling policy is too aggressive, causing scale-down during retraining.

B.The retraining pipeline runs on a GKE cluster that shares resources with the serving endpoint.

C.The model is being switched from CPU to GPU at deployment.

D.The new model version causes cold start in the serving infrastructure; pre-warm the model by sending a dummy request after deployment.

AnswerD

Pre-warming ensures the model is loaded into memory before serving real traffic.

Why this answer

Option A is correct because the cold start due to model version change causes initial slow inference while caches warm up, and pre-warming with traffic can mitigate. Option B is wrong because GKE is not directly involved. Option C is wrong because GPU switching is not needed.

Option D is wrong because the issue is not resource contention.

Full explanation →

550

MCQhard

You are deploying a deep learning model on edge devices with limited computational resources. The model must run inference in <10 ms and the model size must be under 50 MB. Currently, your trained model is 200 MB and runs in 50 ms. Which combination of model compression techniques should you apply?

A.Only apply weight pruning

B.Apply quantization-aware training and knowledge distillation

C.Use knowledge distillation to train a smaller student model

D.Apply post-training quantization (INT8) and pruning

AnswerD

Quantization reduces size and latency; pruning reduces complexity; both can be applied post-training.

Why this answer

Post-training quantization to INT8 reduces model size by 4x and often speeds up inference. Pruning removes redundant weights, further reducing size. Knowledge distillation would require retraining a smaller student model.

Quantization-aware training is more accurate but needs retraining. For a simple fix, quantization and pruning are effective.

Full explanation →

551

MCQhard

A team uses Vertex AI Pipelines to automate training and deployment. They need to ensure that only models that pass a set of quality checks (e.g., accuracy > 0.9, latency < 100ms) are deployed to production. How should they implement this?

A.Manually review each model before promotion

B.Use Cloud Functions to deploy only if accuracy is reported in BigQuery

C.Set up Cloud Build triggers to deploy every model version

D.Add a Pipeline component that evaluates metrics and uses a conditional gate to deployment

AnswerD

Pipelines support conditional execution based on component outputs.

Why this answer

Vertex AI Pipelines can include custom components to evaluate metrics and conditionally proceed to deployment if thresholds are met. Option A is manual, B lacks conditional logic, D uses different services without such built-in gating.

Full explanation →

552

MCQhard

You are an ML engineer at a logistics company. The company uses a Vertex AI Pipeline with BigQuery ML to train a model that predicts delivery delays based on weather, traffic, and historical order data. The pipeline runs daily and includes steps: (1) data extraction from BigQuery, (2) feature engineering using Dataflow, (3) model training with BigQuery ML (logistic regression), (4) model evaluation, and (5) conditional deployment to a Vertex AI Endpoint if accuracy > 0.85. Recently, the pipeline has been failing at step 5 with the error: "Vertex AI Endpoint creation failed: Quota limit of 1 endpoint per region exceeded." The company has already created one endpoint in the same region for another model. The pipeline is configured to create a new endpoint each time a model is deployed. The engineer needs to fix this with minimal changes to the pipeline code. Which course of action should the engineer take?

A.Submit a quota increase request to Google Cloud for Vertex AI Endpoints in the current region.

B.Change the region in the pipeline configuration to a region with available endpoint quota.

C.Remove the accuracy threshold and deploy every model automatically to a pre-created endpoint.

D.Modify the deployment step to check if an endpoint already exists and, if so, deploy a new model version to the existing endpoint instead of creating a new one.

AnswerD

Reuses the existing endpoint, avoiding quota limits.

Why this answer

Option D is correct because it directly addresses the root cause: the pipeline fails because it tries to create a new endpoint each time, exceeding the regional quota of one endpoint. By modifying the deployment step to check for an existing endpoint and deploying a new model version to it, the engineer avoids quota issues without altering the pipeline's core logic or requiring external approvals. This approach leverages Vertex AI's model versioning capability, which allows multiple model versions under a single endpoint, aligning with minimal code changes.

Exam trap

The trap here is that candidates may focus on quota limits as a resource issue (Option A) or a region issue (Option B), rather than recognizing that the pipeline's deployment logic is architecturally flawed by creating a new endpoint per deployment, which is both inefficient and violates best practices for model serving.

How to eliminate wrong answers

Option A is wrong because submitting a quota increase request is a slow, administrative process that does not constitute a minimal code change and may not be approved quickly, leaving the pipeline broken in the meantime. Option B is wrong because changing the region introduces additional complexity (e.g., data residency, latency, and potential BigQuery dataset location mismatches) and does not address the underlying design issue of creating a new endpoint per deployment. Option C is wrong because removing the accuracy threshold undermines the model quality gate, potentially deploying poor models, and still requires creating a new endpoint each time, which would still hit the quota limit.

Full explanation →

553

MCQmedium

An organization wants to trigger a Vertex AI pipeline whenever new data arrives in a Cloud Storage bucket. Which approach should they use?

A.Configure a Vertex AI pipeline trigger directly on the bucket using the GCP Console.

B.Use Pub/Sub notifications from the bucket and a Dataflow job to start the pipeline.

C.Set up a Cloud Scheduler job that runs every minute and checks for new files in the bucket.

D.Use Cloud Functions triggered by Cloud Storage events to call the Vertex AI pipeline API.

AnswerD

This is the recommended event-driven architecture: Storage event → Cloud Function → pipeline.

Why this answer

Option D is correct because Cloud Functions can be directly triggered by Cloud Storage events (e.g., `google.storage.object.finalize`) and can then call the Vertex AI pipeline API using the Cloud SDK or client libraries. This provides a serverless, event-driven architecture that reacts immediately to new data without polling or additional infrastructure.

Exam trap

The trap here is that candidates may assume Vertex AI pipelines have a built-in Cloud Storage trigger (Option A) or over-engineer the solution with Dataflow (Option B), when the simplest and most native serverless approach is Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Vertex AI pipelines do not support configuring a trigger directly on a Cloud Storage bucket via the GCP Console; there is no native bucket-to-pipeline trigger. Option B is wrong because while Pub/Sub notifications from the bucket are possible, adding a Dataflow job introduces unnecessary complexity and latency—Dataflow is a batch/stream processing engine, not a lightweight event router. Option C is wrong because a Cloud Scheduler job that runs every minute and checks for new files is inefficient (polling), introduces up to 60 seconds of latency, and does not scale well; it also requires custom code to track file states.

Full explanation →

554

MCQeasy

A developer wants to add text translation to a mobile app. They need to translate user-generated content into multiple languages, and latency is critical. Which pre-built API should they use?

A.Translation API

B.Vision API

C.Text-to-Speech API

D.Natural Language API

AnswerA

Translation API is designed for text translation across languages.

Why this answer

Translation API provides fast, real-time translation for text. Natural Language API is for analysis. Text-to-Speech is for audio.

Vision API is for images.

Full explanation →

555

MCQhard

A team of ML engineers is building a real-time fraud detection system. They use Cloud Pub/Sub to stream transactions, Dataflow for feature engineering, and Vertex AI to get predictions. They want to ensure that the data used for training matches the data used for serving to avoid training-serving skew. Which approach should they take?

A.Use a batch processing system for both training and serving to ensure identical feature calculations.

B.Implement separate feature engineering pipelines for training and serving, but document them carefully.

C.Use Vertex AI Feature Store to store features computed during training and retrieve them in the serving pipeline.

D.Ensure that both training and serving read from the same Cloud Storage location.

AnswerC

Feature Store provides a consistent feature definition and computation.

Why this answer

Vertex AI Feature Store ensures that the same feature engineering logic is applied consistently during both training and serving. By storing precomputed features in the Feature Store, the serving pipeline retrieves the exact same feature values that were used during training, eliminating the risk of training-serving skew. This approach is specifically designed for real-time systems where streaming data (via Pub/Sub and Dataflow) must be served with identical transformations.

Exam trap

The trap here is that candidates confuse data consistency (same raw source) with feature consistency (same computed values), leading them to pick Option D, which only addresses raw data location, not the transformation logic.

How to eliminate wrong answers

Option A is wrong because batch processing introduces latency that is incompatible with real-time fraud detection, and it does not guarantee identical feature calculations if the batch and streaming codebases diverge. Option B is wrong because separate pipelines inevitably lead to implementation differences, documentation drift, and training-serving skew — the opposite of the desired outcome. Option D is wrong because reading from the same Cloud Storage location only ensures raw data consistency, not that the feature engineering transformations (e.g., aggregations, windowing, encoding) are identical between training and serving.

Full explanation →

556

Multi-Selectmedium

A company wants to build a model to predict housing prices using BigQuery ML. They have a dataset with features like area, number of bedrooms, and location. Which TWO model types are appropriate for this regression task?

Select 2 answers

A.LOGISTIC_REG

B.K_MEANS

C.MATRIX_FACTORIZATION

D.BOOSTED_TREE_REGRESSOR

E.LINEAR_REG

AnswersD, E

Why this answer

BOOSTED_TREE_REGRESSOR (D) is appropriate because it is a tree-based ensemble method specifically designed for regression tasks, and BigQuery ML supports it via the `CREATE MODEL` statement with `model_type='BOOSTED_TREE_REGRESSOR'`. It handles non-linear relationships and interactions between features like area, bedrooms, and location, making it suitable for predicting continuous housing prices.

Exam trap

Cisco often tests the distinction between regression and classification models, leading candidates to mistakenly choose LOGISTIC_REG for regression tasks because of the word 'regression' in its name, but it is actually a classification algorithm.

Full explanation →

557

MCQmedium

A data scientist needs to retrieve training data from Vertex AI Feature Store that exactly matches the feature values as they were at a specific historical timestamp to avoid label leakage. Which feature view configuration should they use?

A.Enable point-in-time retrieval on the feature view.

B.Use the offline store without point-in-time and rely on data ordering.

C.Use the online store with a timestamp filter.

D.Create a new feature view with only historical data.

AnswerA

Point-in-time retrieval ensures no future data leaks.

Why this answer

Point-in-time retrieval is a feature of Vertex AI Feature Store that returns feature values as of a specified timestamp.

Full explanation →

558

Multi-Selectmedium

A team of data scientists and ML engineers is collaborating on a shared feature store in Vertex AI Feature Store. They need to ensure that feature definitions are versioned and that changes are reviewed before being used in production pipelines. Which TWO practices should they implement?

Select 2 answers

A.Allow data scientists to edit feature definitions directly in the Vertex AI Feature Store console.

B.Require code reviews for all changes to feature definitions before merging to the main branch.

C.Define multiple feature views in Vertex AI Feature Store for different environments and manage access via IAM.

D.Store feature definition code in a version-controlled repository such as Cloud Source Repositories.

E.Use scheduled batch jobs to synchronize feature definitions from a shared spreadsheet to Vertex AI Feature Store.

AnswersB, D

Code reviews ensure quality and approval.

Why this answer

Option B is correct because requiring code reviews for all changes to feature definitions before merging to the main branch enforces a peer-review gate, ensuring that modifications are validated for correctness, consistency, and compliance before they reach production. This aligns with MLOps best practices for governance and reduces the risk of introducing errors or breaking changes into the feature store.

Exam trap

Google Cloud often tests the distinction between environment isolation (IAM and multiple feature views) and the actual versioning/review process, leading candidates to mistakenly select Option C as a versioning practice when it only addresses access control and environment separation.

Full explanation →

559

MCQhard

A logistics company uses Vertex AI AutoML Tables to predict delivery delays based on order attributes, weather data, and traffic data. The model is retrained weekly using a Vertex AI Pipeline that runs a BigQuery query to get training data, then triggers AutoML training. Recently, the pipeline fails with the error 'Dataset not found' when the AutoML training step starts. The BigQuery query runs successfully and outputs a table. Which is the most likely cause?

A.The AutoML training step is referencing a different dataset location.

B.The training data has been manually deleted from Cloud Storage.

C.The pipeline's IAM permissions are insufficient to access BigQuery.

D.The BigQuery output table is not being passed as a Vertex AI Dataset resource.

AnswerD

The pipeline must create a Vertex AI Dataset from the BigQuery table for AutoML to use.

Why this answer

The error 'Dataset not found' occurs because AutoML Tables requires a Vertex AI Dataset resource (a metadata wrapper) to reference the training data, not just a BigQuery table. The pipeline's BigQuery query produces a table, but if that table is not explicitly converted into or passed as a Vertex AI Dataset resource (via the `aiplatform.Dataset` creation step), AutoML training cannot locate it. Option D correctly identifies this missing step as the root cause.

Exam trap

Google Cloud often tests the distinction between a raw data source (BigQuery table) and a Vertex AI Dataset resource, trapping candidates who assume AutoML can directly consume a BigQuery table without the required metadata wrapper.

How to eliminate wrong answers

Option A is wrong because the error is 'Dataset not found', not a location mismatch; AutoML Tables uses Dataset resource IDs, not direct paths, so a different dataset location would cause a different error (e.g., 'Permission denied' or 'Table not found'). Option B is wrong because the training data is stored in BigQuery, not Cloud Storage, and the error occurs at the AutoML step, not during data retrieval; manual deletion of a Cloud Storage file would not affect a BigQuery-sourced dataset. Option C is wrong because the BigQuery query runs successfully, proving the pipeline's IAM permissions to access BigQuery are sufficient; insufficient permissions would fail at the query step, not at the AutoML training step.

Full explanation →

560

MCQhard

A data science team is deploying a PyTorch model for real-time inference using Vertex AI Endpoints. The model requires a custom container with specific CUDA drivers and Python packages. They have created a Docker image and pushed it to Artifact Registry. The pipeline should automatically retrain the model every week and deploy the new version if it passes validation. However, the deployment step fails intermittently with the error 'The container image is not compatible with the machine type.' What is the most likely cause?

A.The service account does not have permission to pull the container from Artifact Registry.

B.The container image requires GPU support but the machine type specified in the endpoint is a CPU-only machine.

C.The container's health check endpoint is not responding correctly.

D.The model artifact size exceeds the maximum allowed for the machine type.

AnswerB

CUDA drivers require GPU machines; using a CPU machine causes compatibility error.

Why this answer

The error 'The container image is not compatible with the machine type' indicates a mismatch between the container's hardware requirements and the machine type selected for the Vertex AI Endpoint. Since the custom container requires specific CUDA drivers, it is built for GPU acceleration. If the endpoint is configured with a CPU-only machine type (e.g., n1-standard-4), the container will fail to run because the GPU drivers cannot initialize, triggering this incompatibility error.

Exam trap

Google Cloud often tests the distinction between deployment-time compatibility errors and runtime health check failures, tricking candidates into confusing a misconfigured machine type with a failing health probe.

How to eliminate wrong answers

Option A is wrong because a permission issue (e.g., missing artifactregistry.reader role) would produce an 'unauthorized' or 'access denied' error when pulling the image, not a compatibility error. Option C is wrong because a failing health check would cause the deployment to succeed initially but then report the container as unhealthy, not a pre-deployment compatibility error. Option D is wrong because Vertex AI has no per-machine-type artifact size limit; model size constraints are separate and would manifest as a resource-exhausted error, not a compatibility error.

Full explanation →

561

Multi-Selectmedium

An organization wants to collect ground truth labels for model quality monitoring and store them in BigQuery. They also want to compute and visualize a confusion matrix over time. Which TWO actions should they take? (Choose 2)

Select 2 answers

A.Configure Vertex AI Model Monitoring for prediction drift

B.Use Vertex AI Model Evaluation to run sliced evaluation

C.Enable request/response logging to Cloud Logging

D.Use Vertex AI Model Evaluation to compare predictions with ground truth and generate confusion matrix

E.Upload ground truth data to a BigQuery table

AnswersD, E

Model Evaluation can compute confusion matrices over time from prediction and ground truth tables.

Why this answer

Ground truth labels must be uploaded to BigQuery. Vertex AI Model Evaluation can then be used to compute metrics like confusion matrices.

Full explanation →

562

Multi-Selecthard

You are designing a batch prediction pipeline using Vertex AI. The input data is 50 TB in CSV format on GCS. The model requires feature engineering that involves complex transformations (e.g., datetime parsing, one-hot encoding). Which THREE services or steps should you include in your pipeline?

Select 2 answers

A.Use Cloud Functions to transform each file individually.

B.Use Cloud SQL to store intermediate results.

C.Run Vertex AI batch prediction job with GCS source pointing to the processed TFRecord files.

D.Use Dataflow to read CSV, perform feature engineering, and write to GCS in TFRecord format.

E.Use Dataflow to read CSV, perform feature engineering, and write to BigQuery.

AnswersC, D

Batch prediction can read from GCS and use the trained model.

Why this answer

Option C is correct because Vertex AI batch prediction jobs require input data in TFRecord format for optimal performance with TensorFlow-based models. By writing the processed data as TFRecords to GCS, you enable the batch prediction service to read and score the data efficiently, leveraging its native support for this format.

Exam trap

Cisco often tests the misconception that any cloud service can handle large-scale data processing, but the trap here is that Cloud Functions and Cloud SQL are inappropriate for batch processing of 50 TB, leading candidates to overlook the need for a distributed data processing service like Dataflow.

Full explanation →

563

Multi-Selectmedium

A company needs to detect objects in live video streams from security cameras. They require low-latency predictions and want to minimise operational overhead. Which TWO services should they use? (Choose 2)

Select 2 answers

A.Live Stream API

B.Vertex AI Predictions

C.Cloud Run

D.AutoML Video

E.Video Intelligence API

AnswersC, E

Why this answer

Video Intelligence API can analyse live streams with object detection. Cloud Run provides serverless compute to host the streaming pipeline. AutoML Video requires custom training, Vertex AI Predictions is for deployed models, Live Stream API is for ingestion.

Full explanation →

564

MCQhard

A global e-commerce company uses BigQuery ML to forecast daily sales for 10,000 products. They use a time-series model with a horizon of 7 days. Recently, forecasts for a specific product category have been consistently too high. They suspect the model is not capturing a new seasonal pattern. Which action should they take first to diagnose the issue?

A.Retrain the model with minimal additional data

B.Run ML.EVALUATE on the recent sales data and compare accuracy metrics

C.Increase the forecast horizon to 14 days

D.Switch to AutoML forecasting via Vertex AI AutoML

AnswerB

Allows quantifying drift and identifying underperforming categories.

Why this answer

Running ML.EVALUATE on recent sales data allows you to compute accuracy metrics (e.g., MAE, MAPE) specifically for the period where the model is failing. This isolates whether the error is due to a new seasonal pattern or another cause, without retraining or changing the model architecture. It is the standard first diagnostic step in BigQuery ML for time-series models.

Exam trap

Google Cloud often tests the principle that diagnosis must precede action—candidates mistakenly jump to retraining or switching tools instead of evaluating the existing model's performance on the problematic data window.

How to eliminate wrong answers

Option A is wrong because retraining with minimal additional data does not diagnose why forecasts are too high; it only incorporates more data without identifying the root cause. Option C is wrong because increasing the forecast horizon to 14 days would worsen the problem by extending predictions further into the uncertain future, not addressing the seasonal pattern miss. Option D is wrong because switching to AutoML forecasting via Vertex AI AutoML is a premature architectural change that bypasses the diagnostic step; you should first evaluate the current model to understand the error before migrating.

Full explanation →

565

MCQmedium

An MLOps team wants to set up alerts for GPU memory utilization on Vertex AI Training jobs. Which approach is most efficient?

A.Enable Cloud Audit Logs for the training job and parse the logs for GPU memory events.

B.Create a log-based metric from the training job's GPU logs.

C.Add a container sidecar that emits a custom metric for GPU memory usage via OpenCensus.

D.Use the 'compute.googleapis.com/accelerator/memory_utilization' metric with a metric threshold condition.

AnswerD

Automatically collected GPU metric.

Why this answer

Option D is correct because Vertex AI training jobs automatically export the 'compute.googleapis.com/accelerator/memory_utilization' metric to Cloud Monitoring. This metric is natively collected by the Google Cloud agent on the training VM, so you can directly create a metric threshold alert without any custom instrumentation or log parsing. It is the most efficient approach as it requires zero additional code or configuration.

Exam trap

Google Cloud often tests the misconception that custom instrumentation (sidecars or log parsing) is always required for GPU monitoring, when in fact Vertex AI provides a native metric that eliminates that need.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs record administrative actions (e.g., who created a job), not runtime GPU memory utilization; they lack the granularity needed for real-time resource monitoring. Option B is wrong because log-based metrics require you to first generate GPU memory logs (which Vertex AI does not emit by default) and then parse them, adding latency and complexity compared to using a pre-existing metric. Option C is wrong because adding a sidecar container to emit a custom metric via OpenCensus is unnecessary overhead; Vertex AI already exposes the exact GPU memory metric natively, making a sidecar redundant and less efficient.

Full explanation →

566

MCQmedium

A team is using Delta Lake on Dataproc for their data lake with ACID transactions. They want to version data for ML experiments and roll back to a previous version if needed. Which Delta Lake feature should they use?

A.Delta Lake streaming

B.Delta Lake schema enforcement

C.Delta Lake time travel

D.Delta Lake optimization (Z-order)

AnswerC

Time travel allows querying historical versions.

Why this answer

Delta Lake provides time travel via VERSION AS OF or TIMESTAMP AS OF.

Full explanation →

567

MCQeasy

A marketing agency uses Vertex AI AutoML Vision to classify social media images into brand logos and generic content. They have 5,000 images per class. The model achieves 95% accuracy on validation set, but in production it misclassifies many images that contain logos in unusual angles or lighting. They have limited ML expertise and want to improve robustness. Which action should they take?

A.Switch to a custom CNN model trained with data augmentation.

B.Augment the training set with images that have varied angles and lighting.

C.Deploy the model with a lower confidence threshold.

D.Use Vertex AI Matching Engine for similarity search instead.

AnswerB

Simply adding more diverse training images improves model robustness.

Why this answer

Option B is correct because the core issue is a domain shift between the training data (likely clean, canonical logo images) and production data (logos at unusual angles and lighting). Augmenting the training set with those specific variations directly addresses the lack of robustness by exposing the model to the missing edge cases during training, which is the most effective and simplest fix for a team with limited ML expertise using AutoML Vision.

Exam trap

The trap here is that candidates often assume a more complex model (custom CNN) is needed for robustness, when in fact the problem is a data distribution mismatch that can be fixed with simple data augmentation, which is the most practical solution for a team with limited ML expertise using a managed service like AutoML.

How to eliminate wrong answers

Option A is wrong because switching to a custom CNN model requires significant ML expertise to design, train, and tune, which contradicts the team's limited ML expertise; AutoML Vision already uses a CNN-based architecture under the hood, so the issue is data quality, not model architecture. Option C is wrong because lowering the confidence threshold would increase the number of false positives (misclassifying generic content as logos), which does not fix the model's inability to correctly recognize logos at unusual angles—it only changes the decision boundary, not the model's feature representation. Option D is wrong because Vertex AI Matching Engine is designed for similarity search (e.g., finding nearest neighbors in an embedding space), not for classification; it would require generating embeddings for all images and does not directly solve the classification robustness problem, nor does it leverage the existing labeled training data.

Full explanation →

568

MCQmedium

A team is training a large TensorFlow model that requires more memory than a single GPU provides. They have access to multiple GPUs on a single machine. Which distributed training strategy should they use to split the model layers across GPUs?

A.tf.distribute.experimental.MultiWorkerMirroredStrategy

B.tf.distribute.experimental.ParameterServerStrategy

C.Manual device placement using tf.device to assign layers to specific GPUs

D.tf.distribute.MirroredStrategy

AnswerC

Model parallelism in TensorFlow is typically done by manually placing operations on different GPUs using tf.device.

Why this answer

Model parallelism splits the model itself across devices, as opposed to data parallelism which replicates the model. For TensorFlow, this is typically achieved using device placement or tf.distribute.experimental.ParameterServerStrategy with manual partitioning, but for splitting layers, tf.distribute.experimental.DeviceAssignment or manual tf.device is used.

Full explanation →

569

Multi-Selectmedium

An ML pipeline must run a set of preprocessing tasks for each data shard in parallel. Which KFP SDK features should they use to implement this? (Choose two.)

Select 2 answers

A.dsl.ParallelFor

B.dsl.PipelineParam

C.dsl.Collected

D.dsl.Condition

E.dsl.ExitHandler

AnswersA, C

This creates a parallel loop over a list parameter.

Why this answer

Option A (dsl.ParallelFor) is correct because it allows you to iterate over a list of items (e.g., data shard identifiers) and execute a set of tasks for each item in parallel within a KFP pipeline. Option C (dsl.Collected) is correct because it collects the outputs from each parallel iteration of a dsl.ParallelFor loop into a single list, enabling downstream tasks to consume all shard results as a single artifact or parameter.

Exam trap

Cisco often tests the distinction between parallel iteration (dsl.ParallelFor + dsl.Collected) and sequential iteration or conditional logic, so candidates mistakenly pick dsl.Condition or dsl.PipelineParam when they see 'for each shard' and think of parameters or branching.

Full explanation →

570

MCQeasy

What is the primary benefit of using a centralised model registry in MLOps?

A.Governance and version control of models

B.Better hyperparameter tuning

C.Faster model training

D.Automatic model deployment

AnswerA

Registry manages model versions, metadata, and aliases.

Why this answer

A centralised model registry provides governance, versioning, and lineage tracking, enabling collaboration and auditability.

Full explanation →

571

MCQmedium

Your ML pipeline uses Vertex AI Feature Store to serve features for online predictions. You need to monitor the freshness of features in the online store. Which approach is most effective?

A.Set up a Cloud Monitoring alert for feature store entity count.

B.Schedule a nightly BigQuery batch job to compare feature values.

C.Create a custom metric in Cloud Monitoring that tracks the time since last feature update, and set an alert threshold.

D.Enable detailed audit logs in Feature Store and export to BigQuery.

AnswerC

Directly measures staleness.

Why this answer

Option C is correct because Cloud Monitoring custom metrics allow you to track the timestamp of the last feature update in Vertex AI Feature Store and set an alert threshold for staleness. This directly measures feature freshness, which is critical for online predictions where stale features can degrade model accuracy. Other options either measure unrelated metrics (entity count), are too slow (nightly batch), or focus on auditing rather than real-time monitoring.

Exam trap

The trap here is that candidates confuse monitoring entity count (a capacity metric) with freshness, or assume that batch comparison or audit logs provide real-time monitoring, when only a custom staleness metric with alerting directly addresses the requirement.

How to eliminate wrong answers

Option A is wrong because monitoring the entity count in the feature store tracks the number of stored feature values, not the time since they were last updated, so it cannot detect staleness. Option B is wrong because a nightly BigQuery batch job introduces latency of up to 24 hours, making it unsuitable for real-time freshness monitoring required for online predictions. Option D is wrong because enabling detailed audit logs and exporting to BigQuery provides an after-the-fact record of changes but does not offer real-time alerting on feature staleness.

Full explanation →

572

MCQmedium

A data science team wants to version control their datasets along with code using Git. They need a tool that integrates with Git and tracks changes to large data files. Which tool should they use?

A.BigQuery table snapshots

B.Delta Lake

C.Git LFS

D.DVC

AnswerD

DVC integrates with Git and manages data versioning, pipelines, and experiments.

Why this answer

DVC (Data Version Control) is designed to version large datasets and models alongside Git, storing data in remote storage and metadata in Git.

Full explanation →

573

MCQeasy

An ML team is using Vertex AI Pipelines to run automated retraining workflows. They want to monitor pipeline execution and receive alerts when a pipeline run fails. Which Google Cloud service should they use to set up such alerts?

A.Vertex AI Metadata

B.Cloud Monitoring

C.Cloud Logging

D.Cloud Scheduler

AnswerB

Cloud Monitoring can be configured with alerts on metrics like pipeline run failure count or success rate.

Why this answer

Cloud Monitoring (formerly Stackdriver Monitoring) is the correct service because it provides alerting policies that can be triggered based on pipeline run status metrics, such as failure counts or run state changes. Vertex AI Pipelines automatically exports execution metrics to Cloud Monitoring, allowing you to define conditions (e.g., metric 'pipeline/run_count' with filter 'status=FAILED') and configure notifications via channels like email, Pub/Sub, or PagerDuty.

Exam trap

The trap here is that candidates confuse Cloud Logging (which stores logs) with Cloud Monitoring (which provides alerting), or assume Vertex AI Metadata can trigger alerts because it tracks pipeline metadata, but it lacks any notification or policy engine.

How to eliminate wrong answers

Option A is wrong because Vertex AI Metadata is a managed metadata store for tracking artifacts, lineage, and executions; it does not provide alerting capabilities. Option C is wrong because Cloud Logging is for storing and querying logs, not for setting up proactive alerts on pipeline failures (though logs can be used to create log-based metrics, the question specifically asks for alerts on pipeline execution, which is natively handled by Cloud Monitoring metrics). Option D is wrong because Cloud Scheduler is a cron job service for triggering workflows on a schedule; it cannot monitor pipeline runs or generate failure alerts.

Full explanation →

574

Multi-Selectmedium

Which THREE actions should be taken to automate a machine learning pipeline using Cloud Build and Vertex AI?

Select 3 answers

A.Write a cloudbuild.yaml that builds a training container and submits a Vertex AI PipelineJob

B.Use Cloud Functions to retrain the model each time a build completes

C.Set up a Cloud Scheduler job to poll for new build artifacts

D.Define the training and deployment steps in a Vertex AI Pipeline and submit it from Cloud Build

E.Configure a Cloud Build trigger to run on commits to the source repository

AnswersA, D, E

Cloud Build uses build config to define steps, including submitting pipeline jobs.

Why this answer

Option A is correct because Cloud Build's cloudbuild.yaml can define a step that builds a custom training container and submits it as a Vertex AI PipelineJob. This directly automates the ML pipeline by using Cloud Build to trigger a Vertex AI pipeline, which is the recommended pattern for CI/CD of ML workflows.

Exam trap

Google Cloud often tests the distinction between event-driven triggers (Cloud Build triggers, Pub/Sub) and polling mechanisms (Cloud Scheduler, Cloud Functions) — the trap here is that candidates may think polling or separate functions are needed for automation, when in fact Cloud Build's native triggers and pipeline submission are the correct, integrated approach.

Full explanation →

575

MCQmedium

You are using Vertex AI hyperparameter tuning with a custom container. The training job reports the objective metric but Vizier is not converging. Which configuration change could improve convergence?

A.Use a larger machine type for each trial

B.Increase the number of parallel trials

C.Reduce the number of parallel trials

D.Switch from Bayesian to grid search

AnswerB

More parallel trials allow broader exploration, helping convergence.

Why this answer

Increasing the number of parallel trials can help explore more of the hyperparameter space, but the specified number of max trials must be high enough. Reducing parallel trials may slow exploration. Bayesian optimisation benefits from more trials.

Full explanation →

576

MCQmedium

An ML team wants to automatically retrain a model when prediction drift is detected on the deployed endpoint. They have Vertex AI Model Monitoring configured to send alerts to Cloud Monitoring. Which minimal set of additional services should they use to trigger a retraining pipeline?

A.Cloud Monitoring + Cloud Functions + Vertex AI Pipeline

B.Cloud Monitoring + Cloud Functions + Vertex AI Endpoint

C.Cloud Logging + Cloud Scheduler + Vertex AI Training

D.Cloud Monitoring + Cloud Pub/Sub + Cloud Functions + Vertex AI Pipeline

AnswerA

Correct chain: alert -> Pub/Sub (implicitly via Cloud Functions trigger) -> Cloud Functions -> Vertex AI Pipeline.

Why this answer

Cloud Monitoring alerts can trigger Pub/Sub, which invokes Cloud Functions. Cloud Functions can then start a Vertex AI Pipeline for retraining.

Full explanation →

577

MCQmedium

A company uses Vertex AI Vector Search (Matching Engine) for a product recommendation system. The product embeddings are updated hourly. Which index update method should they use to ensure low latency for new items?

A.Batch rebuild the index every hour

B.Use streaming updates to add new embeddings incrementally

C.Create a new index each hour and swap endpoints

D.Use brute-force index to simplify updates

AnswerB

Correct: Streaming updates allow near-real-time ingestion of new vectors.

Why this answer

Option B is correct because Vertex AI Vector Search supports streaming updates, allowing new embeddings to be added incrementally without rebuilding the entire index. This ensures low latency for new items by making them searchable almost immediately after update, which is critical for hourly refresh cycles where batch rebuilds would introduce significant delay.

Exam trap

The trap here is that candidates often assume batch rebuilds are the only reliable method for consistency, overlooking that streaming updates in Vertex AI Vector Search are designed specifically for low-latency incremental ingestion without sacrificing search quality.

How to eliminate wrong answers

Option A is wrong because batch rebuilding the index every hour incurs high latency and computational cost, as the entire index must be reconstructed from scratch, delaying availability of new items. Option C is wrong because creating a new index each hour and swapping endpoints is inefficient and introduces downtime during the swap, plus it requires managing multiple index versions unnecessarily. Option D is wrong because brute-force indices do not simplify updates; they perform exhaustive linear scans, which are slow and unscalable for large embedding sets, and they lack the optimized approximate nearest neighbor (ANN) search that Vector Search provides.

Full explanation →

578

Multi-Selecteasy

An ML team is deploying a model to Vertex AI for the first time. Which THREE are best practices for scaling from prototype to production?

Select 3 answers

A.Manually scale instances based on historical traffic patterns.

B.Store all features in a Feature Store for consistency.

C.Use a single large instance to simplify management.

D.Monitor model performance for drift and accuracy degradation.

E.Automate model retraining and deployment using Vertex AI Pipelines.

AnswersB, D, E

Feature Store ensures consistent feature computation across training and serving.

Why this answer

Storing all features in a Feature Store (Option B) ensures consistency between training and serving, preventing training-serving skew. Vertex AI Feature Store provides a centralized repository for feature values, enabling reuse, point-in-time lookups, and online serving with low latency, which is critical for production reliability.

Exam trap

Google Cloud often tests the misconception that manual scaling or single-instance architectures are simpler and more reliable, but the PMLE exam emphasizes automated, resilient, and consistent practices like autoscaling and feature stores for production ML workloads.

Full explanation →

579

MCQeasy

A team has developed a prototype of a recommendation model using a small dataset on a single VM. They need to scale to a larger dataset for production training. They plan to use Vertex AI training with a custom container. What is the best practice for handling the increased data volume?

A.Increase the batch size to maximum.

B.Use TFRecord format and streaming reads.

C.Store all data in memory before training.

D.Use a single powerful VM with high memory.

AnswerB

Efficiently loads data in batches, leveraging Cloud Storage streaming.

Why this answer

Option B is correct because using TFRecord format with streaming reads allows efficient, scalable data loading from Cloud Storage, reducing memory pressure and improving I/O performance. Option A is wrong because storing all data in memory is not scalable. Option C is wrong because increasing batch size to maximum can cause memory issues and may not improve throughput.

Option D is wrong because a single powerful VM still has limits and is not cost-effective for large datasets.

Full explanation →

580

MCQmedium

An ML engineer is designing a pipeline that should run only when new training data arrives in a Cloud Storage bucket. Which event-driven approach should they use to trigger the Vertex AI Pipeline?

A.Use Cloud Storage Pub/Sub notifications to send events to a Cloud Function that triggers the pipeline.

B.Use Cloud Tasks to queue a pipeline run whenever a new file is uploaded.

C.Configure the pipeline to run on a schedule and check for new data inside the pipeline.

D.Set up a Cloud Scheduler job that runs every minute to check for new files.

AnswerA

This is the recommended event-driven pattern: Cloud Storage event → Pub/Sub → Cloud Function → Vertex AI API.

Why this answer

The best approach is to use Cloud Storage notifications via Pub/Sub, then a Cloud Function that receives the event and calls the Vertex AI API to create a pipeline job. This is a common event-driven pattern. Cloud Scheduler is for scheduled triggers, not event-driven.

Cloud Tasks and Cloud Run are not typically used for this purpose.

Full explanation →

581

MCQeasy

Your company deploys batch prediction jobs using Vertex AI Batch Prediction. You need to monitor the jobs for failures and performance. What is the recommended approach?

A.Use Cloud Logging to export batch prediction logs and create log-based metrics.

B.Set up email alerts in the Vertex AI console for failed jobs.

C.Use Cloud Monitoring to create custom dashboards and alerts based on Vertex AI batch prediction metrics.

D.Enable the Recommender to get optimization suggestions for batch jobs.

AnswerC

Cloud Monitoring natively supports Vertex AI metrics for batch predictions.

Why this answer

Option C is correct because Cloud Monitoring (formerly Stackdriver) is the native Google Cloud service for collecting, visualizing, and alerting on metrics from Vertex AI, including batch prediction job success rates, latency, and resource utilization. It provides pre-built dashboards and the ability to create custom alerts, making it the recommended approach for monitoring failures and performance in a centralized, scalable way.

Exam trap

Google Cloud often tests the misconception that Cloud Logging is the primary monitoring tool for metrics, when in fact Cloud Monitoring is the dedicated service for metrics and alerting, while Cloud Logging is for logs and log-based metrics only.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for log data, not structured metrics; while you could create log-based metrics from batch prediction logs, this is an indirect, less efficient method that lacks the pre-built performance metrics and alerting capabilities of Cloud Monitoring. Option B is wrong because email alerts in the Vertex AI console are not a native feature; Vertex AI does not provide a built-in email alerting mechanism for job failures—alerts must be configured through Cloud Monitoring or Cloud Logging. Option D is wrong because the Recommender provides optimization suggestions (e.g., machine type, resource allocation) but does not monitor job failures or performance in real time; it is a post-hoc analysis tool, not a monitoring solution.

Full explanation →

582

MCQeasy

A retail company has deployed a machine learning model using Vertex AI Endpoints to predict inventory demand. The model was trained on data from the past two years and has been in production for six months. The team has enabled Vertex AI Model Monitoring to track prediction drift with an alert threshold of 0.2. Last week, they received an alert that the prediction drift score reached 0.35, exceeding the threshold. The engineer checks the monitoring dashboard and sees that the distribution of predictions has shifted noticeably compared to the training data. The engineer also notices that the model's accuracy metrics, computed from weekly ground truth data, have remained within acceptable range. What should the engineer do first?

A.Investigate the input feature distributions for the recent serving requests to identify if data drift is the underlying cause of the prediction drift.

B.Increase the prediction drift alert threshold to 0.4 to reduce the number of false alerts.

C.Retrain the model using the latest three months of data to incorporate recent trends.

D.Roll back to an earlier model version that had lower prediction drift.

AnswerA

By checking input feature distributions, the engineer can confirm whether data drift is present, which commonly causes prediction drift even if accuracy remains temporarily stable.

Why this answer

The prediction drift alert indicates a shift in prediction distribution, but accuracy is stable. This suggests data drift (change in input features) rather than concept drift. The engineer should first investigate input feature distributions to confirm if data drift is the cause.

Retraining (A) is premature without root cause analysis. Increasing the threshold (C) ignores the underlying issue. Rolling back (D) may not help if the previous version also suffers from the same data drift.

Full explanation →

583

MCQmedium

A company is using Vertex AI Vizier for hyperparameter tuning of a model with 5 integer hyperparameters, each with a range of 10-100. They have a budget of 50 trials and want to maximize the chance of finding the best configuration. Which Vizier algorithm should they use?

A.Grid search

B.Simulated annealing

C.Bayesian optimization (GP bandit)

D.Random search

AnswerC

Bayesian optimization uses a probabilistic model to select promising configurations, ideal for small budgets.

Why this answer

Bayesian optimization (GP bandit) is best for small trial budgets as it uses past results to guide search. Grid search would be too many combinations. Random search is better than grid but still less efficient than Bayesian.

Vizier does not support simulated annealing.

Full explanation →

584

MCQmedium

An ML engineer wants to containerize a custom training script and use it as a component in a Vertex AI Pipeline. The component should accept a dataset URI and a learning rate parameter, and output a trained model artifact. Which approach should the engineer use to define the component?

A.Use a pre-built Google Cloud Pipeline Component for Vertex AI Training with custom container configuration.

B.Use ContainerComponent from kfp.v2.components to define the container, its inputs, and outputs.

C.Define a Python function component with @dsl.component and include the container code inline.

D.Use the importer component to import the script and then run it as a task.

AnswerB

ContainerComponent allows defining a custom container component with explicit inputs and outputs.

Why this answer

Option B is correct because ContainerComponent from kfp.v2.components allows you to define a custom container component by specifying the container image, command, inputs, and outputs directly. This is the appropriate approach when you have a custom training script that you want to containerize and use as a component in a Vertex AI Pipeline, as it gives you full control over the container configuration and artifact handling.

Exam trap

Cisco often tests the distinction between Python function components and container components, and the trap here is that candidates may confuse @dsl.component (for Python functions) with containerized components, leading them to choose Option C even though it cannot properly define a container image and artifact outputs.

How to eliminate wrong answers

Option A is wrong because pre-built Google Cloud Pipeline Components for Vertex AI Training are designed for standard training jobs with built-in algorithms or custom containers, but they do not allow you to define custom inputs and outputs as artifacts in the same declarative way as ContainerComponent; they are more rigid and less suited for a fully custom component with a dataset URI and learning rate parameter. Option C is wrong because @dsl.component is used for Python function components that run Python code directly, not for containerized components; including container code inline would mix the container definition with Python function logic, which is not the intended use and would not properly handle container image specification and artifact outputs. Option D is wrong because the importer component is used to import existing artifacts (like models or datasets) into the pipeline, not to run a training script; it cannot execute a custom training script or produce a trained model artifact from scratch.

Full explanation →

585

MCQmedium

A team wants to deploy a BigQuery ML model for online prediction. Which approach should they take?

A.Export the model to Cloud Storage and deploy to AI Platform

B.Export the model to Vertex AI and create an endpoint

C.None of these; BigQuery ML models cannot be used for online prediction

D.Use BigQuery ML's ML.PREDICT for online predictions

AnswerB

Vertex AI supports deploying BigQuery ML models for online serving.

Why this answer

BigQuery ML models can be exported directly to Vertex AI for online prediction. Vertex AI provides a managed endpoint that supports real-time serving with low latency, which is required for online prediction. Exporting to Cloud Storage and then deploying to AI Platform is outdated because AI Platform is now part of Vertex AI, and the recommended path is to export the model directly to Vertex AI and create an endpoint.

Exam trap

Google Cloud often tests the distinction between batch prediction (ML.PREDICT) and online prediction (Vertex AI endpoint), and the trap here is that candidates assume BigQuery ML's ML.PREDICT can serve real-time requests, but it is designed for batch processing only.

How to eliminate wrong answers

Option A is wrong because exporting the model to Cloud Storage and deploying to AI Platform is a legacy approach; AI Platform has been integrated into Vertex AI, and the current best practice is to export directly to Vertex AI. Option C is wrong because BigQuery ML models can indeed be used for online prediction by exporting them to Vertex AI and creating an endpoint. Option D is wrong because ML.PREDICT in BigQuery ML is designed for batch predictions, not for real-time online predictions with low-latency requirements.

Full explanation →

586

Multi-Selecteasy

A team is deploying a new model version. They want to ensure that they can quickly roll back if the new version performs poorly in production. Which TWO actions should they take? (Choose 2.)

Select 2 answers

A.Keep the old model version deployed alongside the new one

B.Configure Vertex AI Model Monitoring to compare predictions

C.Use traffic splitting to gradually shift traffic

D.Set up Cloud Monitoring alerts on model performance

E.Store multiple model versions in the same endpoint

AnswersC, E

Traffic splitting allows you to direct a small percentage of traffic to the new version and easily shift all traffic back if issues arise.

Why this answer

Option C is correct because traffic splitting allows you to gradually shift a percentage of inference requests from the old model version to the new one. If the new version performs poorly, you can immediately revert the split to 0% for the new version, providing a fast and controlled rollback without redeploying or disrupting service.

Exam trap

The trap here is that candidates confuse monitoring and alerting (options B and D) with the actual deployment and rollback mechanism, assuming that detecting poor performance is equivalent to being able to quickly roll back, when in fact you need a traffic management feature like traffic splitting to execute the rollback.

Full explanation →

587

MCQhard

A company has multiple business units using the same Vertex AI environment. They need to enforce that models deployed to production have passed a validation pipeline, and only the ML Engineering team can deploy to production. Which IAM configuration should they use?

A.Use Vertex AI Workbench with user-managed notebooks.

B.Use custom roles with permissions to deploy models, and use Cloud Audit Logs to monitor deployments.

C.Use Binary Authorization to ensure models are signed.

D.Use organization policies to restrict deployment to specific locations.

E.Use Vertex AI Model Registry with automated deployment via Cloud Build, and restrict those permissions to the ML Engineering team using IAM conditions.

AnswerE

This ensures only approved pipelines trigger deployment and only authorized team can initiate.

Why this answer

Option E is correct because it combines Vertex AI Model Registry (which enforces that only validated models are promoted to production) with Cloud Build for automated deployment, and uses IAM conditions to restrict deployment permissions exclusively to the ML Engineering team. This ensures that models must pass the validation pipeline before deployment, and only authorized personnel can trigger the deployment process.

Exam trap

The trap here is that candidates may confuse monitoring (Audit Logs) or location restrictions (Organization Policies) with enforcing a validation pipeline and team-specific deployment permissions, missing the need for a model registry and automated deployment with IAM conditions.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench with user-managed notebooks is a development environment for building and training models, not a mechanism for enforcing deployment validation or restricting deployment permissions. Option B is wrong because custom roles with deployment permissions and Cloud Audit Logs only provide monitoring and access control, but do not enforce that models have passed a validation pipeline before deployment. Option C is wrong because Binary Authorization is designed for container image signing and attestation, not for validating ML model pipelines or restricting deployment to specific teams.

Option D is wrong because organization policies can restrict deployment to specific locations (e.g., regions), but they do not enforce model validation or restrict deployment permissions to a specific team.

Full explanation →

588

MCQmedium

A team deploys a model using Vertex AI and wants to monitor for concept drift. What should they track?

A.Number of prediction requests

B.Model prediction latency

C.Changes in input data distribution

D.Changes in the relationship between inputs and outputs

AnswerD

Concept drift is a change in the underlying function mapping inputs to outputs.

Why this answer

Concept drift refers to a change in the underlying relationship between the input features and the target variable over time, which degrades model performance. In Vertex AI, monitoring this requires tracking the statistical relationship between inputs and outputs (e.g., via prediction residuals or model performance metrics), not just the input distribution alone. Option D correctly identifies this need, as concept drift is fundamentally about the input-output mapping shifting, even if the input distribution remains stable.

Exam trap

Google Cloud often tests the distinction between data drift (input distribution changes) and concept drift (input-output relationship changes), and the trap here is that candidates confuse the two, picking Option C because they think monitoring input data is sufficient for detecting all model degradation.

How to eliminate wrong answers

Option A is wrong because the number of prediction requests measures traffic volume, not data or concept drift; it is a scaling or operational metric, not a model quality metric. Option B is wrong because prediction latency measures inference speed, which is a performance indicator unrelated to the statistical properties of data or model relationships. Option C is wrong because changes in input data distribution represent data drift (covariate shift), not concept drift; while data drift can cause concept drift, monitoring only input distribution misses shifts in the input-output relationship that occur without distributional changes.

Full explanation →

589

Multi-Selecteasy

Which TWO are best practices for building ML pipelines on Vertex AI Pipelines?

Select 2 answers

A.Store all trained models in Cloud Storage without versioning

B.Use Cloud Build as the pipeline orchestrator

C.Use a container-based approach for each component

D.Define pipelines using the Kubeflow Pipelines SDK

E.Use Cloud Composer as the primary pipeline tool

AnswersC, D

Containerized components are reusable and scalable.

Why this answer

Option C is correct because Vertex AI Pipelines is designed to run container-based components, where each step in the pipeline is a Docker container that encapsulates its dependencies and execution logic. This approach ensures reproducibility, isolation, and scalability, aligning with best practices for ML pipelines on Vertex AI.

Exam trap

Google Cloud often tests the distinction between general-purpose orchestration tools (Cloud Composer, Cloud Build) and ML-specific pipeline services (Vertex AI Pipelines), expecting candidates to recognize that container-based components and the Kubeflow Pipelines SDK are the correct building blocks for ML pipelines on Vertex AI.

Full explanation →

590

Multi-Selectmedium

Which TWO are best practices when deploying AutoML models to production?

Select 2 answers

A.Monitor for data drift

B.Train the model on a disk to reduce latency

C.Enable Vertex AI Explainability

D.Deploy on sole-tenant nodes

E.Use TPUs for model serving

AnswersA, C

Data drift can degrade performance; monitoring is essential.

Why this answer

Monitoring for data drift (Option A) is a best practice because production models can degrade over time as the statistical properties of input data change. Vertex AI provides a Model Monitoring service that automatically detects skew and drift by comparing serving data distribution against training data distribution, triggering alerts when anomaly thresholds are breached. This ensures model reliability and performance in production.

Exam trap

Google Cloud often tests the misconception that TPUs are suitable for model serving, but TPUs are optimized for training and not supported for Vertex AI AutoML serving, which uses CPUs or GPUs for inference.

Full explanation →

591

Multi-Selectmedium

A data scientist wants to use BigQuery ML for time-series forecasting. They need to evaluate model accuracy and compare different models. Which THREE BigQuery ML functions should they use?

Select 3 answers

A.ML.EXPLAIN_PREDICT

B.ML.PREDICT

C.ML.FEATURE_IMPORTANCE

D.ML.EVALUATE

E.ML.TRAIN

AnswersB, D, E

Why this answer

ML.EVALUATE is used to evaluate model performance. ML.PREDICT generates forecasts. ML.TRAIN is used to retrain models.

ML.FEATURE_IMPORTANCE is for tree models. ML.EXPLAIN_PREDICT provides local explanations.

Full explanation →

592

MCQeasy

A team is using Vertex AI Pipelines to automate their ML workflow. They want to ensure that pipeline runs are reproducible and that artifacts are tracked. Which feature should they use?

A.Vertex AI Feature Store

B.Vertex AI Experiments

C.Vertex AI Model Registry

D.Vertex AI Endpoints

AnswerB

Experiments track parameters, metrics, and artifacts for each run.

Why this answer

Vertex AI Experiments is the correct feature because it captures parameters, metrics, and artifacts for each pipeline run, enabling reproducibility and lineage tracking. This directly supports the team's need to ensure runs are reproducible and artifacts are tracked, as Experiments automatically logs metadata for every execution.

Exam trap

The trap here is that candidates confuse artifact tracking with model management or deployment features, leading them to select Model Registry or Endpoints instead of recognizing that Experiments provides the run-level metadata and lineage required for reproducibility.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is designed for managing and serving feature data for ML models, not for tracking pipeline runs or artifacts. Option C is wrong because Vertex AI Model Registry focuses on managing model versions and deployment, not on capturing run-level metadata or artifact lineage. Option D is wrong because Vertex AI Endpoints are for deploying models to serve predictions, not for tracking reproducibility or artifacts in pipeline runs.

Full explanation →

593

MCQeasy

An ML engineer needs to track the costs incurred by Vertex AI prediction endpoints. Which tool should they use to set budget alerts and monitor spending?

A.Vertex AI Model Monitoring

B.Cloud Logging

C.Cloud Monitoring with custom metrics

D.Google Cloud Budgets & Alerts

AnswerD

Correct: Budgets & Alerts allow setting spending thresholds and notifications.

Why this answer

Google Cloud Budgets & Alerts allow setting budget thresholds and sending notifications. Billing reports provide cost breakdowns by service.

Full explanation →

594

MCQeasy

You need to set up monitoring for a Vertex AI model that serves predictions in real-time. The model is expected to have a latency SLA of under 100ms. Which metric should you configure an alert on to ensure the SLA is met?

A.p50 latency of prediction requests

B.Prediction drift score

C.p99 latency of prediction requests

D.Number of prediction requests per second

AnswerC

p99 captures tail latency critical for SLA.

Why this answer

Option C is correct because p99 latency measures the worst-case latency experienced by 99% of requests, which is the standard metric for enforcing a strict SLA like under 100ms. Monitoring p99 ensures that even the slowest 1% of requests do not violate the threshold, providing a robust guarantee for real-time predictions.

Exam trap

Google Cloud often tests the misconception that median (p50) latency is sufficient for SLAs, but the trap is that SLAs require tail-latency guarantees (p99 or p999) to catch performance outliers that violate the threshold.

How to eliminate wrong answers

Option A is wrong because p50 latency (median) ignores the tail latency, meaning half of the requests could exceed 100ms without triggering an alert, failing the SLA. Option B is wrong because prediction drift score measures changes in model input/output distributions over time, not latency, and is irrelevant for SLA compliance. Option D is wrong because the number of prediction requests per second (throughput) does not measure individual request latency; high throughput can occur even if latency spikes above 100ms.

Full explanation →

595

MCQeasy

A machine learning engineer wants to use a pre-built Google Cloud Pipeline Components (GCPC) to train a model using Vertex AI. Which component should they use?

A.AutoMLTabularTrainingJobRunOp

B.VertexTrainJobOp

C.VertexEndpointCreateOp

D.VertexBatchPredictOp

AnswerB

This is the pre-built component for running a custom training job on Vertex AI.

Why this answer

The Google Cloud Pipeline Components library includes pre-built components for various Vertex AI services. For training, the correct component is VertexTrainJobOp. VertexBatchPredictOp is for batch prediction, VertexEndpointCreateOp for deploying endpoints, and AutoMLTabularTrainingJobRunOp is specifically for AutoML tabular jobs.

Full explanation →

596

Multi-Selectmedium

A company wants to build a low-code ML pipeline using Vertex AI Pipelines and BigQuery ML. They need to train, evaluate, and deploy a model. Which TWO statements are correct about the integration between Vertex AI Pipelines and BigQuery ML? (Choose TWO.)

Select 2 answers

A.BigQuery ML models are automatically stored in Vertex AI Model Registry after training.

B.BigQuery ML supports hyperparameter tuning using the CREATE MODEL statement.

C.Vertex AI Pipelines supports automatic retry of failed steps due to transient errors.

D.Vertex AI Pipeline steps can include BigQuery ML training via the BigQueryQueryJob operator.

E.The trained BigQuery ML model can be registered in Vertex AI Model Registry and deployed to an endpoint.

AnswersD, E

BigQuery ML training can be invoked as a SQL query step.

Why this answer

Option D is correct because Vertex AI Pipelines can integrate with BigQuery ML by using the BigQueryQueryJob operator to execute SQL-based training queries, such as `CREATE MODEL`, as a pipeline step. This allows you to orchestrate BigQuery ML model training within a Vertex AI Pipeline, enabling a low-code ML workflow.

Exam trap

Google Cloud often tests the misconception that BigQuery ML models are automatically registered in Vertex AI Model Registry after training, but in reality, you must explicitly export or upload the model to the registry as a separate step.

Full explanation →

597

MCQmedium

You are an ML engineer at a logistics company. You have deployed a deep learning model on Vertex AI Endpoints using a custom container with GPU acceleration. The model predicts delivery times based on route features. After one week, you notice that the endpoint's GPU utilization is consistently at 10%, but the prediction latency has increased by 50%. The number of prediction requests per second has remained stable. You check the container logs and see no errors. The model is served using TensorFlow Serving with batching enabled (batch size: 32, batch timeout: 100ms). The custom container uses a single NVIDIA T4 GPU. You have also set the Vertex AI endpoint to use autoscaling with minReplicaCount: 1 and maxReplicaCount: 5, and the CPU utilization target is 60%. Which action should you take to reduce latency?

A.Increase the minReplicaCount to 3 to handle requests in parallel.

B.Reduce the CPU utilization target to 40% to trigger more aggressive autoscaling.

C.Quantize the model to FP16 to reduce compute time per inference.

D.Increase the batch size to 64 and batch timeout to 200ms to improve GPU utilization.

AnswerD

Larger batch sizes allow the GPU to process more data per inference, increasing throughput and reducing per-request latency once the batch fills up.

Why this answer

The core issue is low GPU utilization (10%) despite increased latency, indicating that the GPU is underutilized and the bottleneck is likely in batching or data pipeline overhead. Increasing the batch size to 64 and batch timeout to 200ms allows TensorFlow Serving to accumulate more requests per batch, improving GPU throughput and reducing per-request latency by better leveraging GPU parallelism. This directly addresses the mismatch between low GPU utilization and high latency.

Exam trap

The trap here is that candidates focus on scaling or model optimization (A, B, C) without recognizing that low GPU utilization with high latency is a classic sign of inefficient batching, not insufficient compute or replicas.

How to eliminate wrong answers

Option A is wrong because increasing minReplicaCount adds more CPU-bound replicas, which does not address the GPU underutilization and may increase cost without reducing latency, as the GPU is already idle. Option B is wrong because reducing the CPU utilization target triggers more aggressive autoscaling based on CPU, but the bottleneck is GPU utilization, not CPU; this would add more replicas without improving GPU efficiency. Option C is wrong because quantizing to FP16 could reduce compute time per inference, but the problem is low GPU utilization, not compute-bound operations; quantization may not help if the GPU is already idle due to small batch sizes, and it could introduce accuracy loss.

Full explanation →

598

MCQeasy

A company has deployed a TensorFlow model on Vertex AI Prediction for real-time inference. They notice that during peak hours, the prediction latency increases significantly, and some requests time out. The model requires GPU acceleration. Which action should they take to reduce latency and avoid timeouts?

A.Enable autoscaling with min replicas set to the base load and max replicas set to handle peak load, and ensure GPU quota is sufficient.

B.Switch to a larger machine type with more vCPUs.

C.Increase the number of replicas in the Vertex AI Prediction endpoint statically to handle peak load.

D.Use Cloud Functions to invoke the model asynchronously.

AnswerA

Autoscaling adjusts replicas dynamically, and sufficient GPU quota prevents resource bottlenecks.

Why this answer

Option A is correct because enabling autoscaling with appropriate min and max replicas dynamically adjusts capacity to handle peak load, and ensuring sufficient GPU quota prevents resource constraints. Option B is wrong because statically increasing replicas leads to resource waste during low traffic and may not react quickly to spikes. Option C is wrong because increasing CPU resources does not address GPU-bound inference.

Option D is wrong because Cloud Functions is not designed for GPU-accelerated inference and introduces additional latency.

Full explanation →

599

MCQmedium

A data scientist has a TensorFlow 2.x model trained on a single GPU. They want to scale training to multiple GPUs on a single Vertex AI machine without code changes. Which strategy should they use?

A.MultiWorkerMirroredStrategy

B.TPUStrategy

C.CentralStorageStrategy

D.MirroredStrategy

AnswerD

Designed for single-machine multi-GPU training; no code changes needed beyond wrapping in strategy scope.

Why this answer

MirroredStrategy distributes training across GPUs on a single machine with minimal code changes — just wrap model creation and compile inside the strategy scope. MultiWorkerMirroredStrategy is for multi-machine, not single-machine multi-GPU.

Full explanation →

600

MCQeasy

A startup wants to add sentiment analysis to their customer feedback app without any labeled data or custom model training. Which Google Cloud service should they use?

A.Cloud Natural Language API

B.AutoML Natural Language with manual labeling

C.Use BigQuery ML to train a text classification model

D.Train a custom sentiment model on Vertex AI

AnswerA

Pre-trained model available via API call.

Why this answer

The Cloud Natural Language API provides pre-trained models for sentiment analysis that require no labeled data or custom training. It offers a ready-to-use sentiment analysis feature via a simple API call, making it ideal for a startup that wants to add sentiment analysis without any machine learning expertise or data preparation.

Exam trap

Google Cloud often tests the distinction between pre-trained APIs and custom training services, where candidates mistakenly choose AutoML or Vertex AI because they think any ML task requires custom training, overlooking the existence of fully managed, pre-trained APIs like Cloud Natural Language API.

How to eliminate wrong answers

Option B is wrong because AutoML Natural Language with manual labeling requires labeled data and custom model training, which contradicts the requirement of no labeled data or custom training. Option C is wrong because BigQuery ML is designed for training models using SQL queries on structured data, not for pre-trained sentiment analysis, and it still requires labeled data and training. Option D is wrong because training a custom sentiment model on Vertex AI involves building, training, and deploying a custom model, which requires labeled data and significant ML effort, not a pre-built solution.

Full explanation →

Page 8 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Scaling Prototypes into ML Models Collaborating to manage data and models Solving business challenges with ML

See all domains with question counts →