Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 1–75

506 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

MCQeasy

Your team has deployed a scikit-learn model using a custom container on Vertex AI Prediction. The model receives about 100 requests per second, and the endpoint is configured with a single n1-standard-4 machine. You notice that response times are around 200 ms on average, but occasionally spike to over 10 seconds during traffic bursts. You have set the min replicas to 1 and max replicas to 10. Despite this, spikes still occur. What is the most likely cause and the best course of action?

A.The autoscaling is too slow to react; you should increase the max replicas to 20 and reduce the cooldown period.

B.The model is not optimized for parallel inference; you should enable batching in the custom container.

C.The machine type is insufficient for the model size; you should switch to a n1-highmem-8.

D.The container has a memory leak; you should restart the container periodically.

AnswerA

Reducing cooldown and increasing max replicas helps autoscaling respond faster to bursts.

Why this answer

Option A is correct because the occasional spikes suggest that autoscaling is too slow; enabling batching reduces the number of inference calls and smooths out bursts. Option B could help but the autoscaling may still be too slow. Option C is not necessarily needed if average latency is acceptable.

Option D is unlikely to cause intermittent spikes.

Full explanation →

2

Drag & Dropmedium

Drag and drop the steps to implement a CI/CD pipeline for ML models using Cloud Build and Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First configure the trigger, define the pipeline, then commit to trigger training and deployment.

Full explanation →

3

Multi-Selectmedium

A company has a TensorFlow model that requires GPU for inference. They are deploying on Vertex AI. Which TWO configurations are necessary to ensure GPU is used?

Select 2 answers

A.Set the environment variable TF_GPU_ALLOCATOR=cuda_malloc_async.

B.Use the pre-built TensorFlow serving container, which automatically uses GPU if available.

C.Build a custom container with GPU drivers.

D.Select a machine type that includes a GPU (e.g., NVIDIA Tesla T4).

E.Set the accelerator type and count in the model deployment configuration.

AnswersD, E

Necessary to have GPU hardware available.

Why this answer

Option D is correct because Vertex AI requires you to explicitly select a machine type that includes a GPU (e.g., n1-standard-4 with an attached NVIDIA Tesla T4) to provide the physical hardware for GPU acceleration. Without selecting a GPU machine type, the inference will run on CPU only, regardless of any other configuration.

Exam trap

Google Cloud often tests the misconception that simply using a pre-built container or setting environment variables is sufficient to enable GPU acceleration, when in fact you must both select a GPU-capable machine type and explicitly configure the accelerator in the deployment settings.

Full explanation →

4

MCQmedium

You are responsible for monitoring a batch prediction pipeline that runs daily. Recently, the pipeline started failing intermittently with out-of-memory errors. The input data volume has not changed. What is the most likely cause?

A.A recent code change that loads the entire dataset into memory before processing

B.Increase in model size due to retraining

C.Decrease in the number of worker machines

D.Increase in input data size

AnswerA

This could cause OOM for large datasets.

Why this answer

Option A is correct because a code change that loads the entire dataset into memory before processing would directly cause out-of-memory (OOM) errors, even if the input data volume remains unchanged. In batch prediction pipelines, data is typically streamed or processed in chunks to manage memory efficiently. A change that bypasses this pattern and loads all data at once can exceed the available heap or container memory, leading to intermittent failures depending on data characteristics or concurrent loads.

Exam trap

The trap here is that candidates may assume OOM errors are always caused by increased data volume or resource scaling issues, but the question explicitly states data volume is unchanged, forcing you to consider code-level changes that alter memory access patterns.

How to eliminate wrong answers

Option B is wrong because an increase in model size due to retraining would affect memory usage during model loading or inference, but it would not cause intermittent OOM errors if the input data volume is unchanged; model size changes are typically gradual and would cause consistent failures, not intermittent ones. Option C is wrong because a decrease in the number of worker machines would reduce total available memory, but the question states the input data volume has not changed, so this would cause consistent OOM errors on every run, not intermittent ones. Option D is wrong because the question explicitly states that input data volume has not changed, so an increase in data size cannot be the cause.

Full explanation →

5

MCQmedium

A company trains models using Vertex AI Training and wants to share the resulting model artifacts with a different team in another Google Cloud project. What is the most secure way to grant access?

A.Use BigQuery to copy the model artifacts and share the BigQuery dataset.

B.Share the Vertex AI model resource directly by adding the other project's members to the IAM policy on the model.

C.Set the Cloud Storage bucket containing the artifacts to 'public' access.

D.Create a new service account in the other project, then grant it the 'roles/storage.objectViewer' role on the bucket.

AnswerD

Least privilege, secure cross-project access.

Why this answer

Option D is correct because it follows the principle of least privilege and cross-project access best practices. By creating a dedicated service account in the target project and granting it the 'roles/storage.objectViewer' role on the specific Cloud Storage bucket, you avoid exposing the bucket publicly and avoid sharing the Vertex AI model resource directly, which would grant broader permissions than necessary. This approach ensures that only the service account can read the model artifacts, and the other team can use that service account to access the bucket securely.

Exam trap

The trap here is that candidates often confuse sharing the Vertex AI model resource (which controls access to the model metadata and endpoint) with sharing the underlying artifacts in Cloud Storage, leading them to choose option B, which does not grant the necessary read access to the actual model files.

How to eliminate wrong answers

Option A is wrong because BigQuery is a data warehouse service, not a mechanism for copying or sharing model artifacts; model artifacts are stored in Cloud Storage, and BigQuery cannot be used to copy or grant access to those files. Option B is wrong because sharing the Vertex AI model resource directly via IAM grants access to the model metadata and endpoints, but does not grant access to the underlying model artifacts stored in Cloud Storage; the other team would still need separate permissions on the bucket. Option C is wrong because setting the Cloud Storage bucket to 'public' access would allow anyone on the internet to read the artifacts, violating security best practices and potentially exposing proprietary or sensitive model data.

Full explanation →

6

MCQmedium

Refer to the exhibit. A team runs this Vertex AI Pipeline definition but the deploy component never executes, even though the evaluate step outputs a metric of 0.9. What is the most likely cause?

A.The deploy component depends on the gate component, but the gate is not producing an output.

B.The deploy container image does not exist.

C.The evaluate component must be run before train, but the pipeline order is incorrect.

D.The condition should reference the evaluate component's output directly instead of using an input variable.

E.The pipeline should use a custom component for the condition instead of the built-in type.

AnswerD

The `condition` expression must directly use the output reference, not a local input.

Why this answer

Option D is correct because in Vertex AI Pipelines, the `condition` block evaluates a boolean expression at pipeline compile time, not runtime. If the condition references an input variable rather than the actual output of the `evaluate` component, the pipeline will use the default or placeholder value (often `False`), causing the deploy step to be skipped even when the runtime metric is 0.9. The condition must directly reference the `evaluate` component's output (e.g., `evaluate.outputs['metric']`) to be evaluated correctly at runtime.

Exam trap

Google Cloud often tests the distinction between compile-time and runtime evaluation in pipeline orchestration, trapping candidates who assume that pipeline input parameters are dynamically resolved at the same point as component outputs.

How to eliminate wrong answers

Option A is wrong because the gate component is not required for the deploy step; the condition is evaluated based on the evaluate component's output, not a gate output. Option B is wrong because if the deploy container image did not exist, the pipeline would fail with an image pull error, not silently skip execution. Option C is wrong because the pipeline order (train → evaluate → deploy) is correct; the evaluate step must run after train, and the condition is on evaluate's output, not on train.

Option E is wrong because the built-in `condition` component in Vertex AI Pipelines is fully capable of evaluating boolean expressions; a custom component is not needed and would not fix the issue of referencing an input variable instead of a component output.

Full explanation →

7

MCQeasy

A data science team has trained a TensorFlow model and wants to serve it online with minimal latency. Which Vertex AI deployment option should they use to ensure the model can handle traffic spikes without manual scaling?

A.Use Vertex AI Model Garden.

B.Deploy the model to a Vertex AI Endpoint with automatic scaling.

C.Use Vertex AI Batch Prediction for offline inference.

D.Deploy the model to a Compute Engine VM with a load balancer.

AnswerB

Autoscaling handles traffic spikes with low latency.

Why this answer

Vertex AI Endpoints with automatic scaling (option B) are designed for online serving with minimal latency and can automatically adjust the number of replicas based on traffic load, handling spikes without manual intervention. This is the correct choice for a TensorFlow model requiring real-time inference and elastic scaling.

Exam trap

Google Cloud often tests the misconception that any cloud deployment with a load balancer (like Compute Engine) provides automatic scaling, but the trap here is that Vertex AI Endpoints offer managed autoscaling natively, whereas Compute Engine VMs require additional infrastructure setup and do not automatically scale without configuring managed instance groups.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden is a repository of pre-built models and foundation models, not a deployment option for serving custom trained models with automatic scaling. Option C is wrong because Vertex AI Batch Prediction is for offline, asynchronous inference on large datasets, not for real-time online serving with low latency. Option D is wrong because deploying to a Compute Engine VM with a load balancer requires manual scaling configuration (e.g., managed instance groups) and lacks the integrated autoscaling, monitoring, and model versioning capabilities of Vertex AI Endpoints.

Full explanation →

8

Multi-Selectmedium

Which TWO actions can help reduce prediction latency for a model deployed on Vertex AI Endpoint without changing the model architecture?

Select 2 answers

A.Increase the batch size of prediction requests.

B.Attach a GPU accelerator to the endpoint's machine type.

C.Quantize the model from FP32 to INT8.

D.Deploy the model in multiple regions and use global load balancing.

E.Use a smaller machine type to reduce complexity.

AnswersB, C

GPU reduces computation time for neural networks.

Why this answer

Options A and D are correct. Option A (GPU accelerator) can significantly speed up inference for deep learning models. Option D (model quantization) reduces model size and inference time.

Option B (increasing batch size) increases latency per request. Option C (multiregion deployment) reduces network latency but not prediction latency. Option E (smaller machine type) may increase latency.

Full explanation →

9

MCQhard

A company uses Vertex AI Pipelines with prebuilt components for data processing, training, and deployment. They need to integrate a custom validation step written in Python. What is the correct way to include this as a component?

A.Package the code in a Docker container and reference it as a custom job

B.Define the step in the YAML pipeline definition using arbitrary Python commands

C.Create a custom component using the Vertex AI Pipelines SDK @component decorator

D.Use a Cloud Function as a pipeline step

E.Write a standalone Python script and call it using a Cloud Shell step

AnswerC

Standard method for custom components.

Why this answer

Option C is correct because the Vertex AI Pipelines SDK provides a `@component` decorator that allows you to define a custom Python function as a pipeline component. This decorator automatically handles packaging the Python code into a container image, generating the component specification, and integrating it seamlessly with the pipeline orchestration engine. It is the idiomatic and recommended way to add custom validation logic without manually managing Docker or infrastructure.

Exam trap

The trap here is that candidates often confuse the `@component` decorator with a simple function wrapper and assume they can just write inline Python code in the pipeline YAML (Option B), not realizing that Vertex AI Pipelines requires each step to be a containerized component with explicit input/output definitions.

How to eliminate wrong answers

Option A is wrong because packaging code in a Docker container and referencing it as a custom job would create an independent job outside the pipeline DAG, losing the ability to pass inputs/outputs between pipeline steps and breaking the orchestration flow. Option B is wrong because Vertex AI Pipelines YAML definitions do not support arbitrary Python commands; they require prebuilt or custom component definitions with proper container specifications. Option D is wrong because Cloud Functions are event-driven serverless functions not designed for pipeline step integration; they lack native support for pipeline I/O, artifact tracking, and retry logic within Vertex AI Pipelines.

Option E is wrong because Cloud Shell is an interactive environment for ad-hoc commands, not a pipeline execution step; it cannot be used as a component in a Vertex AI Pipeline and would not support parameter passing or artifact management.

Full explanation →

10

MCQhard

A team uses Cloud Composer to orchestrate a complex ML pipeline with many tasks. They notice that the DAG parsing time is very high, causing delays in task scheduling. Which action would most effectively reduce DAG parsing time?

A.Remove all DAG files that are not currently needed from the bucket

B.Increase the parallelism of the Airflow scheduler

C.Optimize DAG files to avoid heavy top-level imports and database queries

D.Combine all DAGs into a single file

AnswerC

Top-level imports/queries are executed on every parse, so reducing them speeds up parsing.

Why this answer

Option C is correct because heavy top-level imports and database queries in DAG files are executed every time the scheduler parses the DAG, which happens frequently (default every 30 seconds). By moving imports inside Python callables or using lazy loading, the parsing time is drastically reduced, allowing the scheduler to process DAGs faster and trigger tasks without delay.

Exam trap

Google Cloud often tests the misconception that reducing the number of DAG files or increasing scheduler resources will fix parsing delays, when the real bottleneck is the top-level code execution inside each DAG file.

How to eliminate wrong answers

Option A is wrong because removing unused DAG files reduces clutter but does not address the root cause of high parsing time; the scheduler still parses all present DAG files, and if they contain heavy top-level code, parsing remains slow. Option B is wrong because increasing scheduler parallelism (e.g., `scheduler_parallelism` or `max_threads`) only affects how many tasks the scheduler can process concurrently, not how fast it parses DAG files; parsing is a sequential, per-file operation. Option D is wrong because combining all DAGs into a single file actually increases parsing time, as the scheduler must parse one very large file with all dependencies loaded at once, and it also breaks Airflow's ability to detect changes per DAG.

Full explanation →

11

MCQhard

An organization uses Cloud Dataflow to preprocess training data. Dataflow jobs are often failing because of insufficient quota for certain resources. The team has requested a quota increase, but the jobs still fail with 'quota exceeded' errors for a different resource. They want to proactively monitor and manage quotas to avoid failures. What is the best approach?

A.Set up Cloud Monitoring alerts for quota usage and automate quota increase requests.

B.Configure Dataflow to use a different pipeline type that avoids the quota.

C.Use Dataflow's autoscaling feature to reduce resource usage.

D.Increase the maximum number of workers in the Dataflow job.

AnswerA

Proactive monitoring and automation allow scaling quotas as needed.

Why this answer

Option A is correct because setting up Cloud Monitoring alerts for quota usage and automating quota increase requests helps catch issues before failures occur. Option B might reduce resource consumption but does not address the root cause of quota limits. Option C is not feasible.

Option D could worsen the problem by requiring more resources.

Full explanation →

12

MCQeasy

A company stores training data in Cloud Storage and uses Vertex AI Training for model training. They want to implement a data validation pipeline to detect data drift before retraining. Which service should they use?

A.Vertex AI Model Monitoring

B.BigQuery ML

C.Cloud Data Loss Prevention

D.Dataflow

AnswerA

Vertex AI Model Monitoring can detect data drift by comparing distributions.

Why this answer

Vertex AI Model Monitoring is designed specifically to detect data drift and feature skew in production ML models by continuously comparing prediction requests against a baseline training dataset. It provides automated alerts when statistical distributions shift beyond a defined threshold, making it the correct choice for a data validation pipeline before retraining.

Exam trap

Google Cloud often tests the distinction between a general-purpose data processing tool (Dataflow) and a specialized managed service (Vertex AI Model Monitoring), leading candidates to choose Dataflow because they think they need to build a custom pipeline, while the question asks for the service that should be used, implying the most appropriate managed solution.

How to eliminate wrong answers

Option B is wrong because BigQuery ML is used for creating and executing ML models directly in BigQuery using SQL, not for monitoring data drift in existing models. Option C is wrong because Cloud Data Loss Prevention (DLP) is focused on inspecting and classifying sensitive data (e.g., PII) for security and compliance, not for statistical drift detection. Option D is wrong because Dataflow is a stream and batch data processing service (based on Apache Beam) that could be used to build a custom drift detection pipeline, but it is not a managed service purpose-built for model monitoring like Vertex AI Model Monitoring.

Full explanation →

13

Multi-Selecthard

A financial services company has deployed a classification model on Vertex AI to detect fraudulent transactions. The model is monitored using Vertex AI Model Monitoring for skew and drift detection, and also logs predictions to BigQuery for analysis. After a month, the monitoring alerts show a significant drift in one feature (transaction_amount). Which TWO actions should the team take to diagnose and address this issue?

Select 2 answers

A.Compare the feature distribution in the training data with the recent serving data using statistical tests.

B.Retrain the model on the most recent data to incorporate the new distribution.

C.Increase the frequency of model monitoring checks to every hour.

D.Increase the sampling rate for prediction logging to ensure full data capture.

E.Reduce the alert threshold to minimize false positives.

AnswersA, B

This diagnostic step helps understand the nature and extent of the drift.

Why this answer

Option A is correct because comparing the feature distribution of the training data with recent serving data using statistical tests (e.g., Kolmogorov-Smirnov or Jensen-Shannon divergence) is the standard first step to quantify the drift and confirm it is statistically significant. This diagnostic action helps the team understand the nature and magnitude of the drift before deciding on remediation steps. Vertex AI Model Monitoring already performs such comparisons, but the team should independently verify the results in BigQuery to ensure accuracy.

Exam trap

The trap here is that candidates often confuse 'detecting drift' with 'fixing drift' and immediately choose retraining (Option B) without first performing a diagnostic comparison, which is a critical step in the ML lifecycle per the PMLE exam's emphasis on systematic troubleshooting.

Full explanation →

14

MCQhard

A company is using AutoML Tables to build a fraud detection model. The dataset has 10 million rows with 100 features, heavily imbalanced (fraud cases 0.1%). They used AutoML Tables with default settings and achieved high precision but very low recall. They need to deploy the model for real-time scoring on a Vertex AI Endpoint. The model will be used by a transaction processing system that requires low latency (<100 ms per prediction) and high throughput. The team is concerned about cost as the endpoint will receive up to 5,000 predictions per second. After deploying the model, they notice that the endpoint's latency occasionally spikes to over 1 second during peak hours. The team wants to optimize both model performance (recall) and serving performance. Which course of action should they take?

A.Retrain the model with adjusted class weights in AutoML Tables to increase recall, then deploy using Vertex AI Prediction with autoscaling enabled.

B.Use BigQuery ML to create a logistic regression model with class weights, then deploy it on Cloud Run with maximum concurrency.

C.Export the AutoML Tables model as a TensorFlow SavedModel and deploy it on Vertex AI Prediction with a larger machine type and increased min replicas.

D.Use Vertex AI Workbench to manually tune a deep neural network with class imbalance techniques, then deploy as a custom container on App Engine.

AnswerA

AutoML Tables supports class weights to handle imbalance, improving recall. Vertex AI Prediction with autoscaling dynamically adjusts resources to maintain latency during spikes and control costs.

Why this answer

Option A is correct because AutoML Tables allows adjusting class weights to handle imbalanced datasets, which directly addresses the low recall issue by penalizing misclassifications of the minority class more heavily. Deploying on Vertex AI Prediction with autoscaling ensures the endpoint can handle up to 5,000 predictions per second while maintaining low latency, as autoscaling dynamically adjusts resources based on traffic, preventing spikes during peak hours.

Exam trap

Google Cloud often tests the misconception that exporting a managed model to a custom format (like TensorFlow SavedModel) and deploying on a larger machine type is the best way to optimize serving performance, when in fact autoscaling and class weight adjustments within the managed service are the correct low-code approach.

How to eliminate wrong answers

Option B is wrong because BigQuery ML's logistic regression is a simpler model that may not capture complex patterns in 100 features, and Cloud Run's maximum concurrency can lead to increased latency under high throughput (5,000 QPS) without dedicated GPU/TPU support for real-time scoring. Option C is wrong because exporting an AutoML Tables model as a TensorFlow SavedModel loses the optimized serving infrastructure of AutoML, and simply using a larger machine type with increased min replicas does not guarantee sub-100ms latency during traffic spikes without autoscaling. Option D is wrong because using Vertex AI Workbench to manually tune a deep neural network is not a low-code solution, and deploying on App Engine introduces cold start issues and lacks the low-latency, high-throughput capabilities of Vertex AI Prediction for real-time scoring.

Full explanation →

15

MCQmedium

Refer to the exhibit. This IAM policy is applied at the project level. What is the effect of the condition?

A.The service account can only access AI Platform resources that start with 'projects/ml-'

B.The service account can only be used in projects whose ID starts with 'ml-'

C.The role is granted only if the project's name contains 'ml-'

D.The condition is ignored because conditions are not supported for service accounts

AnswerA

Condition on resource name limits access to resources with that prefix.

Why this answer

Option A is correct because the condition block uses the `resource.name.startsWith` condition key to restrict access to AI Platform resources whose names begin with `projects/ml-`. This means the service account can only interact with AI Platform resources (such as models, jobs, or endpoints) that have a resource name starting with that prefix, effectively scoping the permission to a specific set of projects or resources.

Exam trap

Google Cloud often tests the distinction between resource-level conditions (like `resource.name`) and identity-level conditions (like `principal` or `request.auth`), and candidates mistakenly apply the condition to the service account's project ID instead of the target resource's name.

How to eliminate wrong answers

Option B is wrong because the condition checks the resource name (the AI Platform resource path), not the project ID of the service account itself; the service account can be from any project, but the resources it can access must have names starting with `projects/ml-`. Option C is wrong because the condition uses `resource.name.startsWith`, which operates on the resource name, not the project's display name or label; the project name is irrelevant. Option D is wrong because IAM conditions are fully supported for service accounts; the condition is evaluated at access time and can restrict permissions based on resource attributes.

Full explanation →

16

Multi-Selectmedium

Which THREE factors should be considered when choosing between using Vertex AI Endpoints and Cloud Run for model serving? (Choose three.)

Select 3 answers

A.Built-in model monitoring

B.Complexity of model containerization

C.Cost per request

D.GPU support

E.Automatic scaling to zero

AnswersA, D, E

Vertex AI Endpoints integrates with Model Monitoring; Cloud Run requires custom implementation.

Why this answer

Options A, B, and C are key differentiators. Vertex AI Endpoints supports GPUs natively, Cloud Run has limited GPU support. Cloud Run inherently scales to zero, Vertex AI endpoints don't always scale to zero easily.

Vertex AI Endpoints has built-in model monitoring, Cloud Run does not. Options D and E are less differentiating: both services have similar cost structures and container requirements.

Full explanation →

17

Multi-Selecthard

Which TWO strategies can help reduce the cost of running ML pipelines on Vertex AI?

Select 2 answers

A.Run hyperparameter tuning jobs with a large search space

B.Use Vertex AI managed datasets to reduce storage costs

C.Manually scale up resources during peak times and scale down during off-peak

D.Use preemptible VMs for training steps where possible

E.Use a larger machine type for training to complete faster

AnswersB, D

Managed datasets avoid duplication and reduce storage costs.

Why this answer

Options B and D are correct. Option B is correct because preemptible VMs are cheaper. Option D is correct because using managed datasets avoids duplicates.

Option A is wrong because larger machines increase cost. Option C is wrong because manual scaling is not cost-effective. Option E is wrong because hyperparameter tuning can increase cost due to many trials.

Full explanation →

18

MCQmedium

The exhibit shows a Cloud Build configuration. An ML engineer wants to automate the deployment of a model to Vertex AI after training. What is missing in this config to successfully deploy the model?

A.A step to upload the training image to Artifact Registry

B.A step to build the serving container image

C.A step to run unit tests

D.A step to create the Vertex AI Endpoint

AnswerB

The config only builds the training image; it needs a separate step to build and push the serving image.

Why this answer

The Cloud Build configuration shown is for training a model, but to deploy it to Vertex AI, a serving container image must be built and pushed to Artifact Registry. Vertex AI requires a custom serving container (or a prebuilt one) to host the model for predictions. Without a step to build the serving container image (e.g., using a Dockerfile that includes the model and serving dependencies), the deployment will fail because there is no runnable image to deploy to the endpoint.

Exam trap

Google Cloud often tests the distinction between training and serving containers, leading candidates to mistakenly think that the training image (or any image) is sufficient for deployment, when in fact a separate serving container is required.

How to eliminate wrong answers

Option A is wrong because uploading the training image to Artifact Registry is already implied or handled by the training step; the missing piece is the serving container image, not the training image. Option C is wrong because running unit tests, while good practice, is not a prerequisite for deploying a model to Vertex AI; the deployment process specifically requires a serving container image. Option D is wrong because creating the Vertex AI Endpoint can be done as part of the deployment step (e.g., via `gcloud ai endpoints create` or the Vertex AI SDK) and is not the missing piece; the fundamental gap is the absence of a serving container image build step.

Full explanation →

19

MCQeasy

A team needs to quickly create a visual interface for data exploration and model building without writing code. They want to run AutoML jobs and visualize results. Which Google Cloud tool should they use?

A.Vertex AI Workbench

B.Cloud Datalab

C.Cloud Composer

D.Google Colab

AnswerA

Provides a managed notebook environment with visual data exploration and one-click AutoML integration.

Why this answer

Vertex AI Workbench provides a managed JupyterLab environment with a low-code interface for data exploration, AutoML model training, and result visualization without writing code. It integrates directly with Vertex AI's AutoML and custom training services, allowing users to run AutoML jobs and view evaluation metrics, feature importance, and predictions through its UI.

Exam trap

Google Cloud often tests the distinction between code-based notebook tools (Colab, Datalab) and managed low-code platforms (Vertex AI Workbench), expecting candidates to recognize that AutoML job execution and visual result exploration require the latter's integrated UI and API access.

How to eliminate wrong answers

Option B (Cloud Datalab) is wrong because it is a deprecated tool that required code-based notebooks and does not support AutoML job execution or low-code visual interfaces. Option C (Cloud Composer) is wrong because it is a workflow orchestration service based on Apache Airflow, designed for scheduling and monitoring pipelines, not for interactive data exploration or AutoML. Option D (Google Colab) is wrong because it is a free, code-centric notebook environment that lacks native integration with Vertex AI AutoML and does not provide a low-code visual interface for model building.

Full explanation →

20

MCQhard

A data scientist runs a batch prediction job on Vertex AI using a custom container. The job processes a large JSONL file (10 GB) and fails with an out-of-memory error. The machine type is n1-standard-4 (15 GB memory). Which action should be taken to resolve the error while minimizing cost?

A.Reduce the batch size in the prediction request.

B.Split the input data into smaller files and run multiple batch jobs.

C.Add a GPU accelerator to offload computation.

D.Use a machine type with more memory, such as n1-highmem-8 (52 GB).

AnswerD

Increasing memory directly solves out-of-memory errors.

Why this answer

Option C is correct because out-of-memory errors suggest the machine's memory is insufficient for the model or data size; increasing to a high-memory machine type adds more memory. Option A is wrong because splitting input data does not reduce per-instance memory pressure if the model itself is large. Option B is wrong because the batch size may need adjustment but the primary issue is memory.

Option D is wrong because using a GPU does not increase memory.

Full explanation →

21

MCQmedium

Your team is developing a machine learning model for real-time fraud detection. The training pipeline runs on Vertex AI and uses BigQuery for feature engineering. Recently, the pipeline has been taking significantly longer to execute. Upon investigation, you find that the BigQuery query for feature extraction is being rerun every time the pipeline runs, even though the underlying data hasn't changed. The pipeline is scheduled to run every hour. You want to reduce cost and execution time without losing the ability to detect data drifts. Which approach should you take?

A.Implement a caching mechanism in the pipeline that stores the results of the BigQuery query and reuses them if the data hasn't changed.

B.Move the feature extraction to a separate scheduled query in BigQuery and load the results into a table that the pipeline reads from.

C.Reduce the pipeline frequency to once a day to minimize the number of runs.

D.Use a conditional pipeline that checks if the data has changed before running the feature extraction step.

AnswerB

This separates concerns and avoids redundant execution, while still allowing data drift detection via the pipeline.

Why this answer

Option B is correct because it decouples the feature extraction from the training pipeline by using a separate scheduled BigQuery query that writes results to a table. This eliminates redundant query execution on every pipeline run, reducing cost and execution time, while the scheduled query can be set to run at a frequency that still detects data drifts (e.g., hourly). The pipeline then reads from the precomputed table, avoiding repeated full scans of the source data.

Exam trap

Google Cloud often tests the misconception that caching or conditional checks are sufficient to reduce cost, when in fact the most efficient solution is to offload the repetitive computation to a separate scheduled job that writes to a table, avoiding any pipeline-level overhead.

How to eliminate wrong answers

Option A is wrong because implementing a caching mechanism that checks if data hasn't changed still requires an initial query or metadata check each run, and caching in the pipeline itself does not leverage BigQuery's native scheduled query capabilities, potentially missing data drift detection if the cache is stale. Option C is wrong because reducing pipeline frequency to once a day would significantly delay fraud detection, violating the real-time requirement and increasing the risk of missing drifts between runs. Option D is wrong because a conditional pipeline that checks for data changes before running the feature extraction step still incurs the overhead of a check query every hour, and if the check is lightweight, it may not accurately detect all data drifts (e.g., schema changes or new partitions), while still adding complexity without the cost savings of a scheduled query.

Full explanation →

22

MCQmedium

A company needs to serve a model for low-frequency inference requests (a few hundred per month) from multiple regions. The priority is simplicity and minimal cost without maintaining infrastructure. Which serving option should they choose?

A.Deploy a real-time Vertex AI Endpoint with min replicas set to 1.

B.Set up a Dataflow streaming pipeline to process requests.

C.Use Vertex AI Batch Prediction triggered as needed.

D.Use Cloud Run with serving container and scale to zero.

AnswerC

Batch prediction is serverless, pay-per-query, and ideal for infrequent large predictions.

Why this answer

Option D is correct because Vertex AI Batch Prediction runs on demand and is cost-effective for infrequent large batches. Option A is wrong because real-time endpoint incurs per-hour cost even if idle. Option B is wrong because Cloud Run is better for online, not offline.

Option C is wrong because Dataflow is more complex and designed for streaming.

Full explanation →

23

MCQhard

A company needs to serve a large Transformer model (5 GB) with strict latency requirements (< 50 ms) and throughput of 1000 requests per second. The model is in SavedModel format. They are considering deployment options on Google Cloud. Which approach best meets these requirements?

A.Deploy on Vertex AI Prediction using a single high-memory VM with a GPU (e.g., n1-highmem-32 with A100).

B.Deploy on Cloud Run with a GPU-enabled instance and increase concurrency.

C.Deploy on Vertex AI Prediction using model parallelism across multiple GPUs on a single VM.

D.Deploy on Vertex AI Prediction using distributed serving with TensorFlow Serving and model sharding across multiple VMs.

AnswerA

A single A100 can handle 5GB model with low latency and high throughput.

Why this answer

Option A is correct because a single high-memory VM with a powerful GPU (e.g., A100) can handle the model size and throughput with low latency, avoiding network overhead. Option B is wrong because model parallelism adds complexity and may not be needed for a 5GB model on a single high-end GPU. Option C is wrong because distributed serving introduces network latency.

Option D is wrong because Cloud Run currently does not support GPU instances effectively.

Full explanation →

24

MCQmedium

A data science team is using AI Platform for training. They want to track hyperparameters and metrics across multiple experiments. What should they use?

A.Cloud Logging with custom metrics

B.Vertex AI Experiments

C.Store metrics in Cloud Storage and compare manually

D.Cloud Monitoring dashboards

AnswerB

Provides experiment tracking, comparison, and analysis.

Why this answer

Vertex AI Experiments is the correct choice because it is the native service within Vertex AI designed specifically for tracking, comparing, and analyzing hyperparameters and metrics across multiple training runs. It provides a centralized UI and SDK to log parameters, metrics, and artifacts, enabling systematic experiment management without manual effort or external tools.

Exam trap

Google Cloud often tests the distinction between logging/monitoring services (Cloud Logging, Cloud Monitoring) and ML-specific experiment tracking (Vertex AI Experiments), leading candidates to pick a generic monitoring tool instead of the purpose-built ML service.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is intended for collecting and querying log data (e.g., application logs, error messages), not for structured tracking of hyperparameters and metrics across experiments; it lacks built-in experiment comparison features. Option C is wrong because storing metrics in Cloud Storage and comparing manually is inefficient, error-prone, and does not provide automated tracking, visualization, or versioning of experiments, which is the core requirement. Option D is wrong because Cloud Monitoring dashboards are designed for monitoring infrastructure and application performance metrics (e.g., CPU usage, latency), not for tracking ML experiment hyperparameters and metrics across multiple runs.

Full explanation →

25

MCQeasy

A startup wants to build a product recommendation engine without writing custom training code. They have user-item interaction data stored in BigQuery. Which Google Cloud service should they use?

A.Cloud Dataflow with ML APIs

B.BigQuery ML matrix factorization

C.Vertex AI AutoML Tables

D.Vertex AI Matching Engine

AnswerB

Train a recommendation model using SQL with no code.

Why this answer

BigQuery ML matrix factorization is the correct choice because it allows building a recommendation engine directly in BigQuery using SQL, without writing custom training code. It supports implicit and explicit user-item interaction data and provides built-in evaluation metrics, making it ideal for low-code ML solutions on existing BigQuery data.

Exam trap

Google Cloud often tests the distinction between services that require custom code (Dataflow) versus those that offer SQL-based low-code ML (BigQuery ML), and the trap here is assuming any ML service like AutoML or Matching Engine is suitable for recommendation without recognizing the specific need for matrix factorization on interaction data.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a data processing pipeline service, not a low-code ML training service; using ML APIs would require custom code to orchestrate and train models. Option C is wrong because Vertex AI AutoML Tables is designed for tabular data with structured features, not specifically for user-item interaction matrices, and requires exporting data from BigQuery. Option D is wrong because Vertex AI Matching Engine is for vector similarity search and nearest neighbor retrieval, not for training matrix factorization models from interaction data.

Full explanation →

26

MCQmedium

An ML engineer is using Cloud Build to trigger a Vertex AI Pipeline on every commit to a repository. The pipeline takes 2 hours. The engineer wants to only run the pipeline when changes are made to specific directories. How can this be achieved?

A.Use Cloud Composer to poll the repository periodically

B.Configure Cloud Build trigger with included file globs

C.Use a Cloud Function to evaluate changes and invoke the pipeline

D.Modify the pipeline to ignore unrelated changes

E.Add a conditional step in the pipeline to abort if no relevant changes

AnswerB

Native feature of Cloud Build triggers.

Why this answer

Cloud Build triggers support 'included file globs' and 'ignored file globs' to filter which file changes should invoke the trigger. By specifying glob patterns for the directories of interest, the trigger will only fire when commits modify files matching those patterns, avoiding unnecessary pipeline runs for unrelated changes.

Exam trap

The trap here is that candidates may think a pipeline-level conditional check (Option E) is sufficient, but they overlook that Cloud Build triggers can filter at the trigger level, avoiding any pipeline startup cost for irrelevant changes.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is an orchestration service for workflows, not a polling mechanism for repository changes; it would add unnecessary complexity and latency. Option C is wrong because using a Cloud Function to evaluate changes and invoke the pipeline is an overengineered solution; Cloud Build triggers natively support file glob filtering without needing an intermediary. Option D is wrong because modifying the pipeline to ignore unrelated changes would still consume resources to start the pipeline and then abort, wasting time and cost.

Option E is wrong because adding a conditional step in the pipeline to abort if no relevant changes still requires the pipeline to start and run until the conditional check, incurring unnecessary execution time and cost.

Full explanation →

27

Multi-Selecteasy

A data scientist wants to use Vertex AI Pipelines to automate a low-code ML workflow. Which two statements are correct regarding best practices? (Choose TWO.)

Select 2 answers

A.Use pre-built components from Google's curated component library to avoid custom code.

B.Store all intermediate artifacts in Cloud Storage to enable reproducibility and reuse.

C.Avoid using pre-built components because they are not customizable.

D.Use the Vertex AI Experiments to track and compare pipeline runs.

E.Use the Kubeflow Pipelines SDK to define the pipeline, which requires extensive coding.

AnswersA, B

Pre-built components enable low-code pipeline construction.

Why this answer

Option A is correct because Vertex AI Pipelines offers a curated library of pre-built components that encapsulate common ML tasks (e.g., data preprocessing, training, evaluation). Using these components reduces the need for custom code, aligning with the low-code ML workflow requirement. This approach accelerates development while maintaining reliability through Google-tested implementations.

Exam trap

The trap here is that candidates confuse Vertex AI Experiments (a tracking tool) with a pipeline design best practice, or they assume pre-built components are rigid and cannot be customized, leading them to incorrectly select D or C.

Full explanation →

28

MCQeasy

A company is deploying a machine learning model for real-time fraud detection. The model must respond to requests within 100ms. The model is a TensorFlow model and will be deployed on Google Kubernetes Engine (GKE). Which Google Cloud service should be used to serve the model to minimize latency?

A.Deploy the model on Cloud Run with minimum instances set to 1.

B.Deploy the model as a Cloud Function triggered by HTTP requests.

C.Deploy the model on Vertex AI Prediction with a custom container.

D.Deploy TensorFlow Serving on GKE with a LoadBalancer service.

AnswerD

TensorFlow Serving is optimized for low-latency serving and can be configured on GKE with a LoadBalancer for direct access, minimizing network hops.

Why this answer

Option D is correct because deploying TensorFlow Serving directly on GKE with a LoadBalancer service provides the lowest-latency path for real-time inference. TensorFlow Serving is optimized for high-performance model serving with batching and gRPC support, and GKE allows fine-grained control over node placement, autoscaling, and networking to meet the 100ms SLA. In contrast, serverless options like Cloud Run or Cloud Functions add cold-start latency and lack the low-level optimization for TensorFlow models.

Exam trap

The trap here is that candidates often assume Vertex AI Prediction is always the best choice for serving models, but for ultra-low-latency requirements (<100ms), a direct deployment on GKE with TensorFlow Serving avoids the overhead of a managed prediction platform.

How to eliminate wrong answers

Option A is wrong because Cloud Run, even with minimum instances set to 1, introduces additional latency from its HTTP request routing layer and does not natively support gRPC or TensorFlow Serving's optimized batching, making it harder to consistently meet 100ms. Option B is wrong because Cloud Functions have a maximum timeout of 60 seconds but suffer from cold-start delays (often 500ms-2s) and lack persistent GPU/TPU support, making them unsuitable for sub-100ms real-time inference. Option C is wrong because Vertex AI Prediction with a custom container adds overhead from Vertex AI's managed infrastructure (e.g., request routing, health checks, and autoscaling logic) that can introduce 10-50ms extra latency compared to a direct TensorFlow Serving deployment on GKE.

Full explanation →

29

MCQhard

Your company uses a custom container for model serving on Vertex AI. After a recent update, the model returns predictions but they are clearly wrong (e.g., negative probabilities for a classification model). The logs show no errors. What is the most likely cause?

A.The preprocessing code in the container was updated but the model was not retrained on the new preprocessing

B.The model file is corrupted

C.The model file was accidentally replaced with a different model

D.The container is using an incompatible version of the serving framework

AnswerA

Feature transformation mismatch leads to incorrect predictions.

Why this answer

Option A is correct because the most likely cause of a model returning predictions without errors, but with clearly wrong outputs like negative probabilities, is a mismatch between the preprocessing logic used during training and inference. If the preprocessing code in the container was updated (e.g., scaling, normalization, or feature engineering steps changed) but the model was not retrained on data processed with that new logic, the model receives inputs that are out of distribution, leading to nonsensical outputs. Vertex AI containers run inference with the deployed code, so any change in preprocessing directly affects the input tensor values without raising runtime errors.

Exam trap

Google Cloud often tests the concept that silent prediction errors (no logs, no crashes) are almost always due to data or preprocessing mismatches, not infrastructure or model file issues, which would generate explicit errors.

How to eliminate wrong answers

Option B is wrong because a corrupted model file would typically cause loading failures, runtime errors, or crashes, not silent generation of plausible but wrong predictions like negative probabilities. Option C is wrong because replacing the model file with a different model would likely produce predictions that are consistently wrong in a different pattern (e.g., all zeros, constant values) or cause shape mismatches, not specifically negative probabilities from a classification model. Option D is wrong because an incompatible serving framework version would usually manifest as import errors, missing symbols, or version mismatch warnings in logs, not silent incorrect predictions with no errors.

Full explanation →

30

MCQmedium

A data scientist trained a custom TensorFlow model using Vertex AI Training and wants to deploy it for online predictions with low latency (<100ms). Which deployment option on Google Cloud is best?

A.Deploy on Cloud Run with a custom container

B.Deploy on Cloud Functions

C.Deploy on AI Platform Prediction (legacy)

D.Deploy on Vertex AI Endpoints

AnswerD

Vertex AI Endpoints provide managed, scalable, low-latency online prediction.

Why this answer

Vertex AI Endpoints is the correct choice because it is purpose-built for deploying TensorFlow models with optimized serving infrastructure, including automatic scaling, GPU/TPU support, and built-in monitoring for latency-sensitive online predictions. It provides a managed endpoint that can achieve sub-100ms latency by leveraging model optimization techniques like TensorFlow Serving and hardware accelerators, which are not available in the other options.

Exam trap

Google Cloud often tests the misconception that any serverless option (like Cloud Run or Cloud Functions) is sufficient for low-latency ML inference, ignoring the need for GPU acceleration and optimized serving infrastructure that only Vertex AI Endpoints provides.

How to eliminate wrong answers

Option A is wrong because Cloud Run, while supporting custom containers, lacks native GPU/TPU acceleration and has a cold-start latency that can exceed 100ms, making it unsuitable for low-latency online predictions. Option B is wrong because Cloud Functions has a maximum timeout of 9 minutes and no GPU support, and its cold-start latency often exceeds 100ms, making it impractical for real-time inference. Option C is wrong because AI Platform Prediction (legacy) is being deprecated and does not offer the same level of integration with Vertex AI's model registry, monitoring, and autoscaling features, and it may not achieve the same low-latency guarantees as Vertex AI Endpoints.

Full explanation →

31

MCQhard

A company deploys a model to Vertex AI Prediction with autoscaling enabled. During a flash sale, traffic spikes 10x, but the endpoint fails to scale fast enough, causing high latency. What is the most likely cause and solution?

A.The min_nodes setting is too low; increase min_nodes to handle baseline traffic

B.Switch to preemptible VMs to reduce cost and allow more instances

C.The model container is too large; rebuild with a smaller image

D.Use Cloud Functions to pre-warm instances before the sale

AnswerA

Higher min nodes allow faster scaling as they are already running.

Why this answer

The correct answer is A. With Vertex AI Prediction autoscaling, the `min_nodes` setting defines the baseline number of instances that are always kept running. During a flash sale, traffic spikes 10x, but if `min_nodes` is set too low, the autoscaler cannot provision new instances quickly enough to handle the sudden load, resulting in high latency.

Increasing `min_nodes` ensures a sufficient baseline capacity to absorb the initial spike while the autoscaler scales up additional nodes.

Exam trap

Google Cloud often tests the misconception that autoscaling is instantaneous or that external services like Cloud Functions can directly pre-warm ML instances, when in reality the root cause is an insufficient baseline capacity (`min_nodes`) to handle the initial burst before the autoscaler catches up.

How to eliminate wrong answers

Option B is wrong because preemptible VMs are designed for cost savings on fault-tolerant workloads, but they can be terminated at any time by Google Cloud, which would exacerbate scaling instability and latency during a traffic spike, not solve it. Option C is wrong because the model container size primarily affects cold start time and deployment speed, not the autoscaler's ability to add instances during a traffic spike; a smaller image would not address the scaling latency issue. Option D is wrong because Cloud Functions cannot pre-warm Vertex AI Prediction instances; pre-warming is typically handled by configuring a higher `min_nodes` or using traffic splitting with canary deployments, not by an external serverless function.

Full explanation →

32

Multi-Selectmedium

Which THREE factors should you consider when deciding between online prediction and batch prediction on Vertex AI?

Select 3 answers

A.The type of machine learning model architecture (e.g., CNN vs RNN)

B.Cost per prediction: batch is often cheaper per request

C.Latency requirements (real-time vs. asynchronous)

D.Traffic pattern: sporadic vs. sustained load

E.Availability of GPU instances in the region

AnswersB, C, D

Batch prediction is typically more cost-effective for large volumes.

Why this answer

Latency requirements, cost structure, and data volume patterns are key factors. Instance availability is similar for both; model architecture does not dictate prediction type.

Full explanation →

33

MCQeasy

A retail company uses Vertex AI AutoML to train a product recommendation model. They have a dataset of past purchases stored in BigQuery. The data science team wants to iteratively train and improve the model. They need to track which dataset version was used for each model and preserve the exact data for reproducibility. They currently export data to CSV files and store them in Cloud Storage. However, the dataset is updated daily, and they want to ensure that models are trained on a consistent snapshot. What should they do?

A.Use Vertex AI Dataset service to create a dataset and export it to BigQuery.

B.Use BigQuery snapshots to capture a versioned dataset and reference the snapshot in the training pipeline.

C.Train the model directly on the BigQuery table and let AutoML handle versioning.

D.Export the data to a timestamped CSV file and store it in Cloud Storage before each training run.

AnswerB

Snapshots provide point-in-time consistency and are easy to manage.

Why this answer

Option B is correct because BigQuery snapshots provide a consistent, versioned view of the dataset at a specific point in time, ensuring reproducibility without duplicating data. By referencing the snapshot in the Vertex AI training pipeline, the team can train models on the exact same data snapshot, even as the source table is updated daily. This approach avoids the overhead of exporting to CSV and Cloud Storage while maintaining data integrity and lineage.

Exam trap

Google Cloud often tests the misconception that exporting to CSV or using Vertex AI Dataset is sufficient for versioning, when in fact BigQuery snapshots provide the native, scalable, and auditable mechanism for point-in-time data consistency without data duplication.

How to eliminate wrong answers

Option A is wrong because the Vertex AI Dataset service is designed for managing training data within Vertex AI, but exporting to BigQuery does not inherently create a versioned snapshot; it simply moves data back to BigQuery without preserving a consistent point-in-time copy. Option C is wrong because training directly on a live BigQuery table does not guarantee a consistent snapshot; AutoML does not handle versioning, and the table may change between training runs, breaking reproducibility. Option D is wrong because exporting to a timestamped CSV file in Cloud Storage is a manual workaround that introduces storage overhead, potential data drift from export timing, and lacks the built-in versioning and query capabilities of BigQuery snapshots.

Full explanation →

34

MCQmedium

A data-processing pipeline using Dataflow needs to incorporate a custom ML prediction step. The team wants to maintain fast processing and minimize latency. What is the optimal approach?

A.Write the data to Cloud Storage, trigger a Cloud Function to call the model, and write results back

B.Use a custom ParDo transform in Dataflow that calls Vertex AI Prediction API directly

C.Send data to a Pub/Sub topic and have a separate subscriber that runs predictions

D.Stream data through Cloud Functions that serve predictions and write to BigQuery

AnswerB

Inline calls within Dataflow are efficient and keep the pipeline linear.

Why this answer

Option B is correct because using a custom ParDo transform in Dataflow allows the pipeline to call the Vertex AI Prediction API synchronously within each worker, avoiding the overhead of external triggers, intermediate storage, or asynchronous messaging. This keeps the data in-memory and minimizes latency by processing predictions inline with the Dataflow streaming or batch pipeline.

Exam trap

Google Cloud often tests the misconception that adding external services like Cloud Functions or Pub/Sub improves modularity without considering the latency penalty, leading candidates to choose options that introduce unnecessary hops instead of keeping prediction inline within the Dataflow pipeline.

How to eliminate wrong answers

Option A is wrong because writing data to Cloud Storage and triggering a Cloud Function introduces significant I/O latency and additional orchestration overhead, breaking the low-latency requirement. Option C is wrong because sending data to Pub/Sub and having a separate subscriber decouples the prediction step, adding network round-trips and potential backpressure issues that increase end-to-end latency. Option D is wrong because streaming data through Cloud Functions for predictions and then writing to BigQuery creates a multi-hop architecture with cold-start risks and no native Dataflow optimization for parallelism or state management.

Full explanation →

35

MCQhard

A company has a large dataset of 1 million unlabeled images for object detection. They want to use AutoML Vision but need to minimize labeling effort. Which strategy should they use?

A.Use Vertex AI Active Learning to choose a subset for labeling

B.Apply data augmentation techniques to increase dataset size

C.Manually label all 1 million images

D.Train a custom object detection model on unlabeled data with unsupervised learning

AnswerA

Active learning selects the most valuable images, reducing labeling effort significantly.

Why this answer

Vertex AI Active Learning is the correct strategy because it intelligently selects the most informative unlabeled images for human labeling, maximizing model accuracy while minimizing labeling effort. This approach uses the model's uncertainty to prioritize data points that will most improve performance, making it ideal for large datasets where manual labeling of all images is impractical.

Exam trap

Google Cloud often tests the misconception that data augmentation can replace the need for initial labeling, when in reality it only expands existing labeled data and does not address the core challenge of obtaining labels for unlabeled images.

How to eliminate wrong answers

Option B is wrong because data augmentation techniques increase dataset size by creating modified copies of existing labeled images, but they do not reduce the initial labeling effort required for the original dataset. Option C is wrong because manually labeling all 1 million images is prohibitively time-consuming and expensive, directly contradicting the goal of minimizing labeling effort. Option D is wrong because unsupervised learning cannot train a custom object detection model without labeled data; object detection requires bounding box annotations or similar labels to learn object locations and classes.

Full explanation →

36

MCQeasy

A machine learning model deployed on Vertex AI is returning erroneous predictions. The team needs to investigate the root cause by examining the prediction request and response details. Which Google Cloud tool is best suited for this?

A.Cloud Monitoring

B.Cloud Debugger

C.Cloud Logging

D.Cloud Trace

AnswerC

Cloud Logging can capture structured logs from Vertex AI predictions, including request and response data for analysis.

Why this answer

Cloud Logging is the correct tool because it captures detailed logs of prediction requests and responses, including input features, model outputs, and any errors. By examining these logs, the team can trace the exact data flow and identify discrepancies causing erroneous predictions, such as data preprocessing issues or model version mismatches.

Exam trap

The trap here is that candidates confuse Cloud Monitoring (which shows aggregate health metrics) with Cloud Logging (which provides granular request/response data), leading them to choose a tool that cannot reveal the specific prediction details needed for root cause analysis.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring focuses on metrics and alerting (e.g., latency, error rates) but does not capture the content of individual prediction requests or responses. Option B is wrong because Cloud Debugger is designed for inspecting live application code state (e.g., variable values) in production, not for logging request/response payloads of ML predictions. Option D is wrong because Cloud Trace provides latency analysis and distributed tracing of requests across services, but it does not log the actual prediction data or response details needed to debug prediction errors.

Full explanation →

37

MCQeasy

An ML engineer is designing a CI/CD pipeline for ML models using Cloud Build and Cloud Deploy. They want to automatically test model performance on a validation set before promoting to production. Which step should be included in the CI/CD pipeline?

A.Run unit tests on the training code

B.Use Cloud Composer to schedule evaluation

C.Deploy to production immediately after training

D.Train the model in the CI/CD pipeline

E.Run a Vertex AI Pipeline for model evaluation and register the model only if metrics exceed thresholds

AnswerE

Implements a quality gate.

Why this answer

Option E is correct because it directly integrates model evaluation into the CI/CD pipeline using Vertex AI Pipelines, which allows automated validation of model performance against predefined thresholds before promotion. This ensures that only models meeting quality criteria are deployed, aligning with MLOps best practices for gated promotions.

Exam trap

Google Cloud often tests the distinction between code testing (unit tests) and model validation (performance metrics), leading candidates to choose A because they conflate software testing with ML evaluation.

How to eliminate wrong answers

Option A is wrong because unit tests on training code verify code correctness but do not assess model performance on a validation set, which is the requirement. Option B is wrong because Cloud Composer is an orchestration tool for workflows, not a CI/CD step for automatic model evaluation before promotion; it would introduce scheduling latency rather than inline gating. Option C is wrong because deploying immediately after training bypasses validation, risking production degradation from underperforming models.

Option D is wrong because training the model in the CI/CD pipeline is possible but does not include the evaluation step needed to gate promotion; it focuses on the training process itself, not validation.

Full explanation →

38

MCQeasy

A non-technical user wants to build a binary classification model using Vertex AI. Which UI should they use?

A.Vertex AI AutoML

B.Vertex AI Workbench

C.Vertex AI Pipelines

D.Vertex AI Prediction

AnswerA

Correct: No-code UI for training.

Why this answer

Vertex AI AutoML is the correct choice because it provides a no-code graphical user interface specifically designed for non-technical users to build, train, and deploy machine learning models, including binary classification models, without writing any code. It automates the entire ML pipeline—feature engineering, model selection, hyperparameter tuning—allowing users to simply upload labeled data and get a production-ready model.

Exam trap

Google Cloud often tests the distinction between 'building/training' tools (AutoML) and 'deploying/serving' tools (Prediction), leading candidates to mistakenly choose Vertex AI Prediction because they confuse the deployment phase with the model creation phase.

How to eliminate wrong answers

Option B is wrong because Vertex AI Workbench is a Jupyter notebook-based development environment intended for data scientists and ML engineers who write custom code, not for non-technical users seeking a low-code solution. Option C is wrong because Vertex AI Pipelines is a tool for orchestrating and automating ML workflows using code-defined pipelines (e.g., Kubeflow Pipelines SDK), requiring programming skills to define steps and dependencies. Option D is wrong because Vertex AI Prediction is a serving endpoint for deploying and running inference on already-trained models, not a UI for building or training models from scratch.

Full explanation →

39

MCQhard

A model deployed on Vertex AI Endpoints returns predictions, but the performance metrics (e.g., AUC) degrade over time. The input data distribution is shifting. The team wants to detect and alert on this drift automatically. Which set of actions should they take?

A.Schedule a batch prediction job daily and compare with ground truth

B.Enable Vertex AI Model Monitoring for feature drift and set up alerts via Cloud Monitoring

C.Use Vertex AI Explainable AI to understand predictions

D.Implement custom logging in the serving container and use BigQuery for analysis

AnswerB

Model Monitoring automatically calculates drift metrics and can trigger alerts when drift exceeds thresholds.

Why this answer

Option B is correct because Vertex AI Model Monitoring can monitor for feature distribution drift and skew, and can be configured to send alerts via Cloud Monitoring. Option A is part of model interpretation, not drift detection. Option C requires ground truth labels, which may not be available immediately.

Option D is manual and not automated.

Full explanation →

40

Multi-Selecthard

Which TWO strategies help ensure data consistency when multiple teams are contributing features to a shared Vertex AI Feature Store?

Select 2 answers

A.Each team should create their own feature store to avoid conflicts.

B.Use only batch ingestion to keep features synchronized.

C.Define and enforce feature schemas using the Feature Store API.

D.Allow each team to independently define feature engineering logic.

E.Set up monitoring and alerting on feature value distributions to detect drift.

AnswersC, E

Schemas ensure consistent data types and values.

Why this answer

Option C is correct because defining and enforcing feature schemas using the Vertex AI Feature Store API ensures that all teams adhere to a consistent data structure (e.g., fixed feature names, data types, and value ranges). This prevents schema drift and ingestion conflicts, which are common when multiple teams independently push features to the same feature store. Without schema enforcement, one team might inadvertently change a feature's data type or add unexpected values, breaking downstream models.

Exam trap

Google Cloud often tests the misconception that 'separate stores' or 'batch-only ingestion' are valid consistency strategies, when in fact the correct approach is centralized schema governance with monitoring to detect drift.

Full explanation →

41

Multi-Selecthard

A financial institution uses a machine learning model to approve loans. They must monitor for fairness and bias. Which THREE Google Cloud tools or features can help them achieve this? (Choose 3.)

Select 3 answers

A.What-If Tool

B.Vertex AI Model Monitoring

C.Cloud Data Loss Prevention

D.Cloud Healthcare API

E.Explainable AI

AnswersA, B, E

The What-If Tool allows testing different scenarios and slicing by protected attributes to evaluate fairness.

Why this answer

The What-If Tool (WIT) is a Google Cloud tool integrated with Vertex AI that allows users to analyze model behavior across different subsets of data, such as demographic groups. It provides interactive visualizations to test how changes in input features affect predictions, enabling fairness assessments by comparing performance metrics across groups. This directly supports monitoring for bias in loan approval decisions.

Exam trap

Google Cloud often tests the distinction between data security tools (like DLP) and ML fairness tools, so candidates mistakenly select Cloud DLP thinking it addresses bias because it handles sensitive attributes, but DLP does not analyze model predictions or fairness metrics.

Full explanation →

42

MCQhard

A large enterprise has multiple ML models deployed in production across different regions. They want to implement a centralized monitoring dashboard that tracks key performance indicators such as prediction accuracy, latency, and error rates for all models, with the ability to drill down into individual model versions. Which approach best meets these requirements?

A.Use Vertex AI Experiments to log metrics and compare across runs

B.Use Cloud Logging to search logs from each model and create a dashboard

C.Use BigQuery to store prediction logs and then visualize in Looker

D.Use Cloud Monitoring with custom metrics reported by each model deployment, and create a unified dashboard with filterable resources

AnswerD

Cloud Monitoring supports custom metrics and dashboards that can be filtered by resource labels (e.g., model name, version), providing centralized visibility and drill-down capability.

Why this answer

Option D is correct because Cloud Monitoring with custom metrics allows each model deployment to report key performance indicators (e.g., prediction accuracy, latency, error rates) as metric time series. These custom metrics can be aggregated into a single unified dashboard, and the dashboard can be configured with filterable resources (e.g., region, model version) to enable drill-down into individual model versions. This approach provides centralized, real-time monitoring without relying on log-based or batch analytics.

Exam trap

Google Cloud often tests the distinction between logging (Cloud Logging) and monitoring (Cloud Monitoring), where candidates mistakenly think log-based dashboards are sufficient for real-time KPI tracking, ignoring the need for structured, low-latency custom metrics.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing training runs (e.g., hyperparameter tuning), not for real-time monitoring of deployed models in production across regions. Option B is wrong because Cloud Logging is a log management service that requires parsing unstructured log entries to extract metrics, which is inefficient for real-time KPIs and lacks native metric aggregation and dashboard drill-down capabilities. Option C is wrong because BigQuery is a data warehouse for storing and querying large datasets, and while Looker can visualize it, this approach introduces latency from batch loading and is not designed for real-time monitoring of live model deployments.

Full explanation →

43

MCQeasy

Refer to the exhibit. A data scientist runs this Vertex AI training job code. What will be the outcome?

A.The job runs as a regular custom training with 10 replicas.

B.A HyperparameterTuningJob is created and runs trials.

C.A CustomJob is created with hyperparameters from the spec.

D.The job fails because parallel_trial_count cannot be less than max_trial_count.

AnswerB

The hyperparameter_tuning_job_spec instructs Vertex AI to run tuning.

Why this answer

The code uses `HyperparameterTuningJob` with `parallel_trial_count=1` and `max_trial_count=10`. This creates a hyperparameter tuning job that runs up to 10 trials, each trial being a separate training run with different hyperparameter values. The `parallel_trial_count=1` means trials run sequentially, not in parallel, but this is valid and does not cause failure.

Exam trap

Google Cloud often tests the misconception that `parallel_trial_count` must be equal to or greater than `max_trial_count`, when in reality it can be any value from 1 to `max_trial_count`, and sequential trials are perfectly valid.

How to eliminate wrong answers

Option A is wrong because the code explicitly creates a `HyperparameterTuningJob`, not a regular custom training job; a regular custom training job would use `CustomJob` or `CustomContainerTrainingJob` without hyperparameter tuning parameters. Option C is wrong because a `CustomJob` does not accept hyperparameter tuning parameters like `parallel_trial_count` or `max_trial_count`; those are specific to `HyperparameterTuningJob`. Option D is wrong because `parallel_trial_count` can be less than `max_trial_count`; the constraint is that `parallel_trial_count` must be less than or equal to `max_trial_count`, and 1 ≤ 10 is valid.

Full explanation →

44

MCQmedium

A financial services company uses BigQuery ML to build a logistic regression model for fraud detection. The model is trained on the last 6 months of transaction data (about 50 million rows). After deployment, the fraud detection team notices a high false positive rate, causing customer dissatisfaction and extra manual review costs. The model is currently retrained monthly. The team wants to reduce false positives without sacrificing recall. They have access to real-time transaction streaming and can compute new features quickly. What is the most effective approach?

A.Replace logistic regression with gradient boosted trees (XGBoost) in BigQuery ML

B.Use Vertex AI AutoML Tables to train a more complex model

C.Increase retraining frequency to daily

D.Add engineered features like rolling transaction count and velocity per user

AnswerD

New features provide more signal to reduce false positives.

Why this answer

Option D is correct because adding engineered features like rolling transaction count and velocity per user directly addresses the high false positive rate by providing the logistic regression model with more discriminative temporal signals. Since the team has access to real-time streaming and can compute features quickly, these features capture behavioral patterns that reduce false positives without sacrificing recall, and logistic regression can effectively leverage them with proper feature engineering.

Exam trap

The trap here is that candidates often assume a more complex model (XGBoost or AutoML) is always better for reducing false positives, but the question specifically tests the principle that feature engineering—especially temporal aggregations—is the most effective lever when the model is already appropriate and data is streaming.

How to eliminate wrong answers

Option A is wrong because replacing logistic regression with gradient boosted trees (XGBoost) may improve model capacity but does not directly target the root cause of high false positives—lack of informative features—and could increase complexity without guaranteed recall preservation. Option B is wrong because using Vertex AI AutoML Tables to train a more complex model similarly addresses model complexity rather than feature insufficiency, and may introduce overfitting or latency issues without solving the false positive problem. Option C is wrong because increasing retraining frequency to daily does not change the underlying feature set or model architecture; it only refreshes weights on the same features, which will not reduce false positives if the model lacks discriminative signals.

Full explanation →

45

Multi-Selectmedium

A machine learning engineer is designing an ML pipeline on Vertex AI. The pipeline includes multiple steps: data validation, preprocessing, training, evaluation, and deployment. The engineer wants to ensure that if the data validation step fails due to schema mismatch, the pipeline stops immediately and does not proceed. Additionally, they want to reuse the preprocessed data from a previous successful run if the source data hasn't changed. Which two configurations should they use? (Choose two.)

Select 2 answers

A.Use a custom exit handler in the data validation step to abort the pipeline.

B.Set the 'on_failure' parameter of the data validation component to 'Stop'.

C.Use conditional branches to check the output of data validation before proceeding.

D.Set the 'cache' option for the preprocessing step to True.

E.Enable 'skip_if_successful' on the preprocessing step.

AnswersB, D

Setting on_failure='Stop' immediately stops the pipeline if the component fails.

Why this answer

Options A and C are correct. Option A: Enabling caching (cache=True) on the preprocessing step allows reuse of outputs when inputs are identical. Option C: Setting on_failure='Stop' on the data validation component stops the pipeline immediately on failure.

Option B is wrong because custom exit handlers are not a standard feature. Option D is wrong because 'skip_if_successful' is not a standard parameter; caching is the correct way. Option E is wrong because conditional branches add unnecessary complexity; the on_failure parameter is simpler.

Full explanation →

46

MCQhard

A company uses Cloud Composer to orchestrate an ML pipeline. They notice that the pipeline occasionally fails because the Composer environment runs out of disk space on the worker nodes. The pipeline uses many large dependencies. What is the most effective long-term solution?

A.Mount a Cloud Storage bucket to the Composer workers using GCSFuse to store large artifacts externally.

B.Move the pipeline to Cloud Functions to avoid Composer's disk limitations.

C.Reduce the size of the Docker image used by the pipeline.

D.Increase the number of worker nodes in the Composer environment.

AnswerA

Keeps local disk usage low by offloading to Cloud Storage.

Why this answer

Mounting a Cloud Storage bucket via GCSFuse allows Composer workers to access large artifacts stored externally without consuming local disk space. This provides a scalable, durable, and cost-effective solution for handling large dependencies, as the pipeline can read/write directly to Cloud Storage, eliminating the disk space bottleneck on worker nodes.

Exam trap

Google Cloud often tests the misconception that scaling out (adding more nodes) solves disk space issues, but the real problem is per-node disk capacity, not overall cluster capacity.

How to eliminate wrong answers

Option B is wrong because Cloud Functions have a limited execution timeout (up to 60 minutes for HTTP functions, 540 seconds for background functions) and a maximum memory of 32GB, making them unsuitable for long-running ML pipelines with large dependencies. Option C is wrong because reducing the Docker image size only addresses the image storage, not the runtime disk space used by large artifacts during pipeline execution. Option D is wrong because increasing the number of worker nodes distributes the workload but does not increase the per-node disk capacity; each worker still has the same local disk limit, so the pipeline can still fail if a single worker runs out of space.

Full explanation →

47

Multi-Selecthard

A healthcare company uses AutoML Tables to predict patient readmission risk. The dataset contains 500,000 rows and 200 features, including patient demographics, lab results, and medical history. The model accuracy is lower than expected. The engineer wants to improve performance using low-code techniques. Which THREE actions are most effective? (Choose THREE.)

Select 3 answers

A.Increase the training time budget to the maximum allowed.

B.Remove highly correlated features using AutoML Tables' built-in feature importance analysis.

C.Engineer new features such as time since last admission and number of previous admissions.

D.Use a custom model architecture via AutoML Tables advanced options.

E.Enable automated handling of missing values and outliers in the dataset configuration.

AnswersB, C, E

Reduces noise and improves model generalization.

Why this answer

Option B is correct because AutoML Tables provides built-in feature importance analysis that can identify and remove highly correlated features, which reduces noise and multicollinearity, often improving model performance without manual intervention. This is a low-code technique that leverages the platform's automated capabilities to streamline feature selection.

Exam trap

Google Cloud often tests the misconception that increasing training time or using custom architectures is a low-code solution, when in fact low-code techniques rely on platform automation like built-in feature engineering and data preprocessing, not manual tuning or custom coding.

Full explanation →

48

MCQeasy

Refer to the exhibit. A Vertex AI prediction endpoint is failing with a deadline exceeded error. The log shows the following. What is the most likely cause?

A.The prediction request is malformed

B.Insufficient CPU or memory for the load

C.The model is too large for the machine type

D.The model version is corrupted

AnswerB

High CPU and memory utilization indicate the machine type is inadequate for the prediction workload, leading to timeouts.

Why this answer

A deadline exceeded error in Vertex AI prediction endpoints typically indicates that the model is taking too long to respond, often due to insufficient CPU or memory resources for the current load. This causes the request to time out before the inference completes, as the underlying infrastructure cannot process the requests quickly enough.

Exam trap

Google Cloud often tests the distinction between deployment-time errors (like model size) and runtime errors (like timeout), so candidates mistakenly associate a deadline exceeded error with model corruption or malformed requests rather than resource constraints.

How to eliminate wrong answers

Option A is wrong because a malformed request would result in an invalid argument or bad request error (e.g., HTTP 400), not a deadline exceeded (HTTP 504) error. Option C is wrong because a model that is too large for the machine type would cause a resource exhaustion error at deployment time (e.g., 'Insufficient memory to load model'), not a runtime deadline exceeded error. Option D is wrong because a corrupted model version would cause model loading failures or prediction errors (e.g., 'Model not found' or 'Internal server error'), not a timeout-related deadline exceeded error.

Full explanation →

49

MCQeasy

You are monitoring a classification model that predicts loan default. The model was trained on data from 2020-2022. In 2023, the economic conditions changed, and the model's accuracy dropped significantly. Which monitoring approach would best help you detect this issue early?

A.Monitor the accuracy of the model on the latest batch of labeled data

B.Monitor feature distribution drift using KS test

C.Monitor the prediction distribution for significant shift from training distribution

D.Monitor the freshness of the training data

AnswerC

Prediction distribution shift can indicate concept drift even without labels.

Why this answer

Option C is correct because monitoring the prediction distribution for a significant shift from the training distribution directly detects changes in the model's output behavior, which is the earliest indicator of concept drift or data drift caused by economic changes. Unlike accuracy monitoring, this approach does not require labeled data, enabling real-time detection of performance degradation before ground truth labels become available.

Exam trap

The trap here is that candidates often choose monitoring feature drift (Option B) because it sounds technical, but they overlook that concept drift—a change in the relationship between features and the target—is better detected by monitoring prediction distribution shifts, not just feature distribution shifts.

How to eliminate wrong answers

Option A is wrong because monitoring accuracy on labeled data is a reactive approach that requires ground truth labels, which are often delayed or unavailable in real-time, making it too slow to detect early drift. Option B is wrong because monitoring feature distribution drift using the KS test only detects changes in input features, not the relationship between features and the target (concept drift), so it may miss shifts in the decision boundary caused by economic changes. Option D is wrong because monitoring the freshness of training data is a data management practice that does not directly detect model performance degradation or drift; it only ensures the training data is recent, not that the model is still valid under new conditions.

Full explanation →

50

MCQmedium

Your organization has a requirement to monitor fairness of an ML model that predicts loan approvals. You need to set up alerts if the model's predictions show bias against a protected group. Which tool on Google Cloud can you use to monitor this?

A.Cloud Vision API to analyze demographic data.

B.Vertex AI Model Monitoring with Fairness Indicators integration.

C.AutoML Tables fairness evaluation results from training.

D.Cloud DLP (Data Loss Prevention) to inspect input features for bias.

AnswerB

Fairness Indicators can be evaluated and monitored via Vertex AI Model Monitoring.

Why this answer

Vertex AI Model Monitoring with Fairness Indicators integration is the correct tool because it allows you to continuously monitor a deployed model's predictions for bias against protected groups (e.g., race, gender) by analyzing prediction distributions and setting alert thresholds. This is a post-deployment monitoring capability, not a training-time evaluation, and it directly addresses the requirement to set up alerts on live predictions.

Exam trap

The trap here is that candidates confuse training-time fairness evaluation (AutoML Tables) with post-deployment monitoring (Vertex AI Model Monitoring), or they mistakenly think data inspection tools like Cloud DLP or Vision API can perform bias analysis on predictions.

How to eliminate wrong answers

Option A is wrong because Cloud Vision API is an image analysis service for detecting objects, text, and faces in images; it has no capability to analyze demographic data or monitor ML model fairness. Option C is wrong because AutoML Tables fairness evaluation results are generated during model training, not for ongoing post-deployment monitoring; the question specifically requires setting up alerts on predictions, which is a monitoring, not training, task. Option D is wrong because Cloud DLP is designed to inspect and redact sensitive data (e.g., PII) in text, not to analyze model predictions for bias or set fairness alerts.

Full explanation →

51

MCQmedium

A company uses Vertex AI Pipelines to orchestrate an AutoML tabular training step followed by a BigQuery ML evaluation step. The pipeline fails because the output of the AutoML step (a model resource name) is not being passed to the BigQuery step. What is the most likely cause?

A.The AutoML training component is implemented as a Python function without proper artifact input/output annotations

B.The pipeline is using a custom pipeline root but the model is in a different region

C.The Vertex AI Pipeline Runner does not have permission to access AutoML models

D.The BigQuery ML evaluation component requires a service agent with Cloud SQL access

AnswerA

Kubeflow Pipelines requires artifact tracking for passing parameters.

Why this answer

In Vertex AI Pipelines, when using the Kubeflow Pipelines SDK, components must explicitly declare their inputs and outputs using type annotations (e.g., `Input[Model]`, `Output[Model]`) or via `@component` decorators with `outputs` specified. If the AutoML training step is implemented as a plain Python function without these annotations, the pipeline framework cannot serialize and pass the model resource name as an artifact to the downstream BigQuery ML evaluation step. This causes the pipeline to fail because the BigQuery step receives no valid model reference.

Exam trap

Google Cloud often tests the distinction between runtime permission errors (like IAM) and pipeline orchestration errors (like missing artifact passing), leading candidates to incorrectly choose a permissions-related option when the real issue is a component definition flaw.

How to eliminate wrong answers

Option B is wrong because a custom pipeline root or regional mismatch would cause storage or execution errors, not a failure to pass an output artifact between steps; the model resource name is a metadata artifact, not a storage path. Option C is wrong because permission issues would manifest as authorization errors (e.g., 403 Forbidden) when the pipeline runner tries to access the model, not as a missing output artifact; the error described is about data flow, not access control. Option D is wrong because BigQuery ML evaluation does not require Cloud SQL access; it uses BigQuery's own service agent and IAM permissions, and Cloud SQL is a separate database service irrelevant to this pipeline.

Full explanation →

52

MCQeasy

A financial company is building a fraud detection model. The dataset has 1% fraud cases and 99% legitimate transactions. Which technique should they use to handle the class imbalance?

A.Use class weighting or synthetic oversampling (SMOTE) during training

B.Randomly undersample the majority class to balance the dataset

C.Collect more data until the fraud rate increases

D.Train without any modifications; the model will naturally handle it

AnswerA

This addresses imbalance effectively.

Why this answer

Class weights or resampling techniques like SMOTE are standard for imbalanced datasets. Option A is correct. Option B (undersampling majority) can lose information.

Option C (collect more data) is impractical. Option D (no alterations) will bias the model.

Full explanation →

53

MCQmedium

A data analyst wants to use Vision API to detect custom objects in manufacturing images, but the pre-trained API does not recognize their specific components. They have 1000 labeled images. Which path offers the fastest time-to-value with minimal coding?

A.Store images in BigQuery and use ML.PREDICT with a custom model

B.Use AutoML Vision for object detection

C.Use a Cloud Function to call the Vision API and post-process results

D.Train a custom object detection model using TensorFlow on Vertex AI

AnswerB

No-code training and deployment.

Why this answer

AutoML Vision for object detection is the fastest path because it requires no custom coding—users simply upload labeled images, and the platform automatically trains a model tailored to their custom components. This directly addresses the need to detect objects the pre-trained Vision API cannot recognize, while minimizing time-to-value compared to manual TensorFlow training or custom infrastructure setup.

Exam trap

Google Cloud often tests the misconception that any cloud function or API call can be adapted to custom objects via post-processing, but the pre-trained Vision API's fixed label set cannot be extended without retraining, making AutoML the only low-code solution that actually learns new object classes.

How to eliminate wrong answers

Option A is wrong because BigQuery ML.PREDICT is designed for structured data and tabular models, not for image-based object detection; storing images in BigQuery and using ML.PREDICT would require converting images to embeddings or using a pre-trained model, which does not solve the custom object recognition problem efficiently. Option C is wrong because calling the pre-trained Vision API via Cloud Function and post-processing results still relies on the same pre-trained model that cannot recognize the custom components, so it fails to address the core requirement. Option D is wrong because training a custom model using TensorFlow on Vertex AI requires significant coding, manual architecture design, and hyperparameter tuning, which is far slower and more complex than using AutoML Vision's no-code automated training pipeline.

Full explanation →

54

MCQeasy

Refer to the exhibit. A team runs the command above and sees only two models. They know there is a model 'model-v3' created three days ago. What is the most likely reason it is not listed?

A.The model was created in a different region.

B.The model is in a different project.

C.The model is not deployed to an endpoint.

D.The model's display name contains a hyphen.

E.The model was created by a different user.

AnswerA

The --region flag filters models by location; missing models are likely in another region.

Why this answer

The `gcloud ai models list` command lists models within a specific region, as Vertex AI models are regional resources. If 'model-v3' was created in a different region, it would not appear in the output unless the `--region` flag is set to that region. This is the most likely reason the model is missing from the list.

Exam trap

Google Cloud often tests the regional scope of Vertex AI resources, trapping candidates who assume model listing is global or project-wide, when in fact it is region-specific and requires the correct `--region` flag.

How to eliminate wrong answers

Option B is wrong because the `gcloud ai models list` command operates within a single project (the current configured project or one specified with `--project`), but the question states the team sees only two models, implying they are in the correct project; a different project would require explicit project specification. Option C is wrong because model listing does not require deployment to an endpoint; Vertex AI lists all models in the project/region regardless of deployment status. Option D is wrong because hyphens in display names are allowed and do not affect listing; the command lists models by their resource name or display name without filtering on special characters.

Option E is wrong because model listing is not user-scoped; all models in the project/region are visible to any user with appropriate permissions, regardless of who created them.

Full explanation →

55

MCQhard

A team deployed a prototype classification model to Vertex AI Prediction. After a week, they notice the metrics shown in the exhibit. What is the most likely cause of the performance degradation and latency increase?

A.The prediction endpoint's autoscaling is too slow, causing requests to queue and time out.

B.The prediction requests are too large, exceeding the maximum request size limit for Vertex AI.

C.The training data does not represent the current production data distribution, causing the model to make incorrect predictions and requiring more computation.

D.The custom prediction container uses outdated libraries that are incompatible with Vertex AI's runtime.

AnswerC

Data distribution shift degrades accuracy and can increase latency if the model is uncertain.

Why this answer

The exhibit shows both accuracy degradation and increased latency. Option C is correct because when the production data distribution shifts away from the training data (data drift), the model makes more incorrect predictions, which can trigger additional computation (e.g., retries, fallback logic, or increased uncertainty estimation) and cause latency spikes. Vertex AI Prediction does not inherently add computation for wrong predictions, but the model's internal confidence thresholds or post-processing steps may consume extra resources when handling out-of-distribution inputs.

Exam trap

Google Cloud often tests the misconception that latency increase must be caused by infrastructure issues (autoscaling or request size) rather than model behavior, but the key clue is the simultaneous accuracy degradation, which points to data drift as the root cause.

How to eliminate wrong answers

Option A is wrong because autoscaling delays cause request queuing and timeouts, which would manifest as increased error rates and latency, but not as a degradation in prediction accuracy (metrics like precision/recall). Option B is wrong because exceeding the maximum request size limit (typically 1.5 MB for Vertex AI online prediction) would result in immediate 413 Payload Too Large errors, not a gradual performance degradation over a week. Option D is wrong because outdated libraries in a custom container would cause deployment failures or runtime errors (e.g., missing symbols, version conflicts), not a gradual accuracy drop; Vertex AI validates container compatibility at deployment time.

Full explanation →

56

MCQeasy

A marketing team wants to analyze customer reviews for sentiment without writing code. Which Google Cloud service should they use?

A.Cloud Dataflow

B.Vertex AI Workbench

C.BigQuery ML

D.Cloud Natural Language API

AnswerD

Correct: Pre-trained, no-code sentiment analysis.

Why this answer

The Cloud Natural Language API (option D) is the correct choice because it provides pre-trained models for sentiment analysis, entity recognition, and syntax analysis via a simple REST API, requiring no code beyond sending HTTP requests. This aligns perfectly with the requirement to analyze customer reviews for sentiment without writing code, as the API abstracts all ML complexity.

Exam trap

Google Cloud often tests the distinction between services that require coding (like Dataflow or Workbench) versus those that offer pre-built, no-code APIs (like Cloud Natural Language API), leading candidates to mistakenly choose BigQuery ML because it uses SQL, which they perceive as 'low-code' but still requires explicit query writing and model management.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a fully managed stream and batch data processing service based on Apache Beam, requiring users to write code (e.g., Java or Python) to define data pipelines, making it unsuitable for a no-code sentiment analysis task. Option B is wrong because Vertex AI Workbench is a Jupyter-based notebook environment for building and deploying custom ML models, requiring users to write code (e.g., Python) to train or use models, not a no-code solution. Option C is wrong because BigQuery ML allows users to create and execute ML models using SQL queries, but it still requires writing SQL statements and managing model creation, which is not a no-code API for direct sentiment analysis of text.

Full explanation →

57

MCQeasy

A company has developed a prototype fraud detection model using a small sample of transactions. The prototype runs on a single VM and uses a Random Forest classifier. They want to scale to the full dataset of 50 million transactions. The data is stored in BigQuery. The team wants to use Vertex AI for training. After moving the code to a custom training container and using Vertex AI Training with a single n1-standard-4 machine, the training job fails with an error: "Process terminated with exit code 1". The logs show: "java.lang.OutOfMemoryError: Java heap space". The model uses a scikit-learn RandomForest. Which course of action is most appropriate?

A.Use a distributed training strategy with multiple workers.

B.Increase the machine type to n1-highmem-8 to provide more memory.

C.Switch from Random Forest to a linear model to reduce memory usage.

D.Switch to a high-CPU machine type like n1-highcpu-16.

AnswerB

More memory alleviates the OOM error for in-memory Random Forest.

Why this answer

Option A is correct because increasing memory (n1-highmem) directly addresses the Java heap space error, as Random Forest memory usage scales with data size. Option B is wrong because high-CPU machines have less memory per core. Option C is wrong because scikit-learn does not natively support distributed training, and setting up distributed Random Forest is complex.

Option D is wrong because switching to a linear model may degrade performance unnecessarily.

Full explanation →

58

MCQmedium

Two teams are collaborating on a project and want to use a shared Feature Store in Vertex AI. They need to ensure that features are discoverable and that access is controlled. What is the best practice?

A.Export features to CSV files in Cloud Storage and share the bucket

B.Build a custom feature pipeline using Dataflow and store in Cloud SQL

C.Each team stores features in their own BigQuery table and shares the table

D.Use Vertex AI Feature Store and grant appropriate IAM roles to each team

AnswerD

Vertex AI Feature Store provides a unified repository with access control and discovery.

Why this answer

Option B is correct because Vertex AI Feature Store supports sharing features with access controls (IAM) and enables discovery through the UI and API. Option A is wrong because BigQuery alone lacks feature store metadata and online serving. Option C is wrong because Cloud Storage is not a feature store.

Option D is wrong because a ad-hoc pipeline is not a managed solution.

Full explanation →

59

Drag & Dropmedium

Drag and drop the steps to deploy a trained TensorFlow model to Vertex AI Prediction in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Export the model, upload to GCS, register as a model, deploy to endpoint, then test.

Full explanation →

60

MCQhard

A team has set up the IAM policy above on a Vertex AI project. Alice, a data scientist, reports that she cannot create a Vertex AI Training custom job using a pre-built container. Other data scientists in the group 'data-scientists@example.com' have the same issue. What is the most likely cause?

A.The 'roles/aiplatform.user' role does not grant the permission to create custom training jobs.

B.The Vertex AI Custom Code Service Agent service account is missing the 'roles/aiplatform.user' role.

C.Alice is not included in the 'data-scientists@example.com' group.

D.The service account 'vertex-ai@project.iam.gserviceaccount.com' does not have permission to access the training data.

AnswerA

Creating custom jobs requires 'aiplatform.customJobs.create', which is not in the aiplatform.user role.

Why this answer

The 'roles/aiplatform.user' role includes the 'aiplatform.customJobs.create' permission, which is required to create custom training jobs. However, the issue is that Alice and her group cannot create a custom job using a pre-built container. The most likely cause is that the 'roles/aiplatform.user' role does not grant the permission to create custom training jobs with pre-built containers; it only allows using managed models or notebooks.

To create custom training jobs, the 'roles/aiplatform.customJobUser' role is needed, which includes the necessary permissions for custom job creation.

Exam trap

Google Cloud often tests the distinction between the 'roles/aiplatform.user' role and more specific roles like 'roles/aiplatform.customJobUser', leading candidates to assume the basic user role covers all Vertex AI actions, when in fact it does not include custom job creation.

How to eliminate wrong answers

Option B is wrong because the Vertex AI Custom Code Service Agent service account is used for custom code training, but the issue is about pre-built containers, not custom code, and the service account's role assignment is not the cause of the permission error for the data scientists. Option C is wrong because the problem states that other data scientists in the group have the same issue, implying Alice is likely in the group; if she were not, she would have a different error, but the group-wide issue points to a role/permission problem. Option D is wrong because the service account 'vertex-ai@project.iam.gserviceaccount.com' is not directly involved in creating custom jobs; it is used for Vertex AI's internal operations, and the error is about creating the job, not accessing training data.

Full explanation →

61

MCQeasy

A team wants to share a trained model with other teams within the organization. They need to provide access to the model artifact in Vertex AI Model Registry and ensure that only authorized teams can deploy the model. What should they do?

A.Grant the other teams access to the Cloud Storage bucket where the model is stored

B.Set the model to public in Vertex AI Model Registry

C.Use Cloud Key Management Service to encrypt the model and share the decryption key

D.Use IAM to grant the 'aiplatform.models.deploy' role to the other teams on the model resource

AnswerD

IAM roles provide fine-grained access control within Vertex AI.

Why this answer

Option D is correct because Vertex AI Model Registry uses IAM to control access to model resources. By granting the 'aiplatform.models.deploy' role on the specific model resource, you ensure that only authorized teams can deploy the model, while other operations (like viewing or updating) remain restricted. This follows the principle of least privilege and avoids exposing the model artifact broadly.

Exam trap

Google Cloud often tests the misconception that sharing the storage bucket or encryption key is sufficient for controlled deployment, when in fact IAM roles on the model resource are required to enforce deployment authorization.

How to eliminate wrong answers

Option A is wrong because granting access to the Cloud Storage bucket where the model is stored would allow teams to download or modify the model artifact directly, bypassing Vertex AI's deployment controls and audit logging. Option B is wrong because setting the model to public in Vertex AI Model Registry would allow anyone in the world to deploy the model, violating security requirements. Option C is wrong because Cloud KMS encrypts data at rest but does not control access to the model resource; sharing the decryption key would not prevent unauthorized deployment, as the key only decrypts the artifact, not the deployment permission.

Full explanation →

62

MCQhard

What is the root cause of the failure?

A.The budget_milli_node_hours parameter is set to 0, which is below the minimum required value

B.The evaluate_model component expects the model artifact but the autopilot_train component does not output a model artifact

C.The location parameter 'us-central1' is not a valid region for AutoML

D.The threshold parameter is missing in the autopilot_train component

AnswerA

Must be at least 1000 (1 node hour).

Why this answer

Option A is correct because the `budget_milli_node_hours` parameter in Vertex AI AutoML training specifies the maximum amount of compute time (in milliseconds) allocated for model training. Setting it to 0 means no compute time is allowed, which causes the training job to fail immediately as it cannot proceed with zero resource allocation. The minimum required value is typically 1 (or higher depending on the task type), so 0 is invalid and triggers a failure.

Exam trap

Google Cloud often tests the misconception that a zero value for a resource allocation parameter is acceptable or defaults to a minimum, when in fact it causes an immediate validation failure.

How to eliminate wrong answers

Option B is wrong because the `autopilot_train` component in Vertex AI AutoML does output a model artifact; the failure is not due to a missing artifact but due to the zero budget parameter. Option C is wrong because `us-central1` is a valid region for AutoML in Vertex AI, and region validity is not the issue here. Option D is wrong because the `threshold` parameter is not a required parameter for the `autopilot_train` component; the failure is caused by the zero budget, not a missing threshold.

Full explanation →

63

MCQhard

A company has a large-scale ML system that uses Vertex AI Pipelines to retrain models weekly. The pipeline includes a custom training job and a batch prediction step. After moving to production, they observe that batch prediction jobs often fail with 'Quota exceeded' errors. The project has sufficient CPU quota. What is the most likely cause?

A.The pipeline is exceeding the maximum number of concurrent pipeline runs.

B.The batch prediction job is requesting a specific accelerator type that has a separate quota limit.

C.The batch prediction job is using a machine type that is not available in the region.

D.The custom training job is consuming all available quota before the batch prediction job starts.

AnswerB

GPUs/TPUs have separate quotas; if exceeded, the job fails with quota exceeded.

Why this answer

The most likely cause is that the batch prediction job is requesting a specific accelerator type (e.g., GPU or TPU) that has a separate quota limit from CPU quota. In Vertex AI, accelerator quotas are distinct from general compute (CPU) quotas, and even if the project has sufficient CPU quota, the accelerator quota may be exhausted, causing 'Quota exceeded' errors.

Exam trap

Google Cloud often tests the misconception that all quota errors are related to CPU or memory, but the trap here is that accelerator types (GPUs/TPUs) have their own independent quota limits that are easily overlooked when CPU quota appears sufficient.

How to eliminate wrong answers

Option A is wrong because exceeding the maximum number of concurrent pipeline runs would result in pipeline submission failures or throttling, not batch prediction job failures with 'Quota exceeded' errors; Vertex AI Pipelines enforces concurrency limits separately. Option C is wrong because if a machine type is not available in the region, the error would be a resource availability error (e.g., 'Machine type not found'), not a quota exceeded error. Option D is wrong because the custom training job and batch prediction job run sequentially within the same pipeline; the training job completes before the batch prediction job starts, so it cannot consume quota during the batch prediction step.

Full explanation →

64

MCQeasy

An ML engineer is using Vertex AI Pipelines with Kubeflow Pipelines SDK (KFP) to orchestrate a training and deployment workflow. They want to reuse a custom component across multiple pipelines. The component is defined in a Python file 'preprocess.py' that includes a function decorated with @kfp.components.create_component_from_func. How should they package this component for reuse?

A.Import the preprocess module and call create_component_from_func on the function, then use the resulting component in pipeline definitions.

B.Save the component as a YAML file using kfp.components.ComponentStore and load it in other pipelines.

C.Compile the pipeline that uses the component into a JSON file and upload it to Vertex AI.

D.Build a custom container image with the function and use it as a base image in other pipelines.

AnswerA

This allows the component to be defined once and reused.

Why this answer

Option A is correct because the recommended way to reuse a custom component defined via `@kfp.components.create_component_from_func` is to import the Python module containing the decorated function and call `create_component_from_func` on that function in each pipeline definition. This creates a reusable component object that can be used directly in the pipeline's `@dsl.pipeline` definition without additional packaging steps. The KFP SDK treats the function as the source of truth, and re-importing ensures the component logic is always current.

Exam trap

The trap here is that candidates may overthink the packaging step and assume a YAML file or container image is required for reuse, when the KFP SDK is designed to treat Python functions as first-class reusable components through simple module imports.

How to eliminate wrong answers

Option B is wrong because `kfp.components.ComponentStore` does not exist; components are stored as YAML using `kfp.components.ComponentStore.load_component_from_file` or `kfp.components.load_component_from_url`, but saving a component as YAML is not the standard method for reusing a `create_component_from_func` component—it is typically used for pre-built or container-based components. Option C is wrong because compiling a pipeline into JSON (or YAML) is for submitting the pipeline to Vertex AI, not for packaging a single component for reuse; the compiled artifact represents the entire pipeline, not an individual component. Option D is wrong because building a custom container image is unnecessary overhead for a lightweight Python function component; container images are used for components defined with `@kfp.components.create_component_from_func` only when the function requires non-standard dependencies, but the question does not indicate such a need, and the standard reuse method is direct import.

Full explanation →

65

MCQeasy

A data scientist wants to automate the retraining of a model when new data arrives in Cloud Storage. Which Google Cloud service is most appropriate for orchestrating this workflow?

A.Cloud Run

B.Vertex AI Predictions

C.Cloud Scheduler

D.Cloud Composer

E.Cloud Functions

AnswerD

Cloud Composer can orchestrate complex workflows triggered by data events.

Why this answer

Cloud Composer (D) is the most appropriate service for orchestrating a retraining workflow because it is a fully managed workflow orchestration service built on Apache Airflow. It allows you to define a Directed Acyclic Graph (DAG) that triggers model retraining when new data arrives in Cloud Storage, handling dependencies, scheduling, and monitoring across multiple steps such as data validation, training, and deployment.

Exam trap

The trap here is that candidates often confuse event-triggered compute services (like Cloud Functions) with full workflow orchestration, failing to recognize that retraining pipelines require multi-step dependency management, retries, and monitoring that only a dedicated orchestrator like Cloud Composer provides.

How to eliminate wrong answers

Option A (Cloud Run) is wrong because it is a serverless compute platform for running stateless containers, not a workflow orchestrator; it lacks native scheduling and dependency management for multi-step pipelines. Option B (Vertex AI Predictions) is wrong because it is a service for deploying models to serve predictions, not for orchestrating the retraining workflow triggered by new data. Option C (Cloud Scheduler) is wrong because it is a cron job service that triggers single actions at fixed times, not a workflow orchestrator that can handle event-driven triggers, conditional logic, and multi-step dependencies.

Option E (Cloud Functions) is wrong because it is a lightweight, event-driven compute service for single-purpose functions; while it can be triggered by Cloud Storage events, it cannot orchestrate complex multi-step pipelines with retries, branching, or monitoring.

Full explanation →

66

MCQmedium

Refer to the exhibit. A team leader applies this IAM policy on a Vertex AI model resource. What does the condition accomplish?

A.Allows the data scientist to access model evaluations only

B.Limits access to models whose resource name starts with 'dev-'

C.Limits access to models owned by the data scientist

D.Limits access to models only in the us-central1 region

E.Limits access to models created after a certain date

AnswerB

The condition 'resource.name.startsWith' matches only models with the 'dev-' prefix.

Why this answer

The condition in the IAM policy uses the `resource.name.startsWith('dev-')` condition expression, which restricts access to Vertex AI model resources whose resource name begins with the prefix 'dev-'. This is a common pattern for environment-based access control, allowing the data scientist to only interact with models designated for development.

Exam trap

Google Cloud often tests the distinction between resource name prefix matching and other common IAM conditions like resource labels, resource location, or creation timestamp, leading candidates to confuse a simple string prefix check with more complex attribute-based conditions.

How to eliminate wrong answers

Option A is wrong because the condition does not restrict access to model evaluations; it filters based on the resource name prefix, not the resource type or sub-resource. Option C is wrong because IAM conditions cannot dynamically check resource ownership; they operate on resource attributes like name, not on who created the resource. Option D is wrong because the condition does not reference any region attribute; region-based filtering would require a condition on `resource.location` or similar.

Option E is wrong because the condition does not involve any date or timestamp comparison; it only checks the string prefix of the resource name.

Full explanation →

67

MCQmedium

A data science team deploys a PyTorch model using Vertex AI Prediction. The model requires GPU for inference, but they notice high costs and underutilized GPUs during off-peak hours. What is the most cost-effective solution?

A.Move the model to Cloud Functions

B.Use a GPU instance with a fixed number of replicas

C.Use a GPU instance with min replicas=0 and autoscaling

D.Switch to a CPU-only machine type

AnswerC

Scales down to zero when unused, saving costs.

Why this answer

Option C is correct because setting min replicas to 0 allows Vertex AI Prediction to scale down to zero instances during off-peak hours, eliminating GPU costs when no requests are being served. Combined with autoscaling, the deployment will spin up GPU-backed instances on demand only when traffic arrives, directly addressing the underutilization issue while maintaining low latency for inference requests.

Exam trap

Google Cloud often tests the misconception that autoscaling alone reduces costs, but the trap here is that without setting min replicas to 0, you still pay for idle GPU instances during off-peak hours, which is the exact problem described in the question.

How to eliminate wrong answers

Option A is wrong because Cloud Functions does not support GPU acceleration; it runs in a serverless environment limited to CPU-only execution, making it unsuitable for GPU-required inference. Option B is wrong because a fixed number of replicas (even with autoscaling) keeps at least one GPU instance running at all times, failing to eliminate costs during zero-traffic periods; min replicas must be 0 to achieve true cost savings. Option D is wrong because the model explicitly requires GPU for inference, and switching to CPU-only would break inference performance or make it infeasible due to model architecture or latency requirements.

Full explanation →

68

MCQhard

A machine learning engineer notices that a model served on Vertex AI Endpoints returns predictions that are consistently 20% slower during the first request after idle (cold start). They are using automatic scaling with min replicas=1. What is the most likely cause and best solution?

A.Vertex AI endpoint warm-up time; set min replicas to 2 to always keep a warm instance

B.Model loading time is high; enable health checks to warm the instance

C.Network latency; deploy to a different region

D.Container initialization delay; use a smaller container image

AnswerA

Increasing min replicas ensures at least one warm instance is always available, reducing cold start latency.

Why this answer

Option D is correct because even with min replicas=1, an instance may be recycled due to idleness or updates, causing a cold start. Setting a higher min replicas (e.g., 2) ensures a warm instance is always available. Option A is incorrect because smaller containers may not significantly reduce cold start.

Option B is incorrect as health checks measure readiness but do not eliminate cold starts. Option C is incorrect as preemptible instances are not relevant to cold starts.

Full explanation →

69

Multi-Selecthard

A company deploys a model to Vertex AI Endpoint with autoscaling enabled. During a traffic spike, they observe high tail latency (99th percentile > 2s). Which TWO factors are most likely contributing to this latency?

Select 2 answers

A.The machine type is underpowered for the model.

B.The autoscaling target_cpu_utilization is set too low (e.g., 0.3).

C.The endpoint has too many traffic splits configured.

D.The min_replica_count is set too low, causing cold starts.

E.The model file is very large (e.g., 2GB), increasing model loading time.

AnswersD, E

Low min replicas lead to cold start delays during spikes.

Why this answer

Options A and C are correct. Option A: if min replicas is too low, new replicas must be created and loaded with the model, causing cold start latency. Option C: a large model file increases cold start time as new replicas load the model.

Option B (underpowered machine) would cause high average latency, not just tail. Option D (too many traffic splits) is unrelated. Option E (target CPU utilization set too low) would cause earlier scaling, reducing tail latency; too high would delay scaling.

Full explanation →

70

MCQmedium

A financial services company uses Vertex AI AutoML Tables to build a credit risk model. The dataset contains 500,000 rows and 50 features, including loan amount, credit score, debt-to-income ratio, and employment length. The target variable is binary: 'default' (1) or 'no default' (0). The data is highly imbalanced, with only 2% defaults. The data scientist trains a model with AutoML Tables using default settings. The evaluation metrics show an AUC of 0.85, but the confusion matrix reveals that the model predicts 'no default' for almost all cases, missing most defaults. The data scientist needs to improve the model's ability to identify defaults without significantly increasing false positives. They have limited time and cannot write custom code. What should they do?

A.Manually split the data into a stratified train/test set to ensure the same proportion of defaults in each.

B.Train multiple models with different algorithms (e.g., XGBoost, Random Forest) and blend them using a custom script.

C.Enable 'Enable weighted evaluation' and set the optimization objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5.

D.Under-sample the majority class to create a balanced dataset and retrain.

AnswerC

Why A is correct: AutoML Tables supports custom optimization objectives to handle imbalance.

Why this answer

Option C is correct because AutoML Tables allows you to set a custom optimization objective to handle class imbalance without custom code. By enabling weighted evaluation and setting the objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5, the model will be tuned to prioritize identifying defaults (recall) while maintaining a specified precision level, directly addressing the need to catch more defaults without a massive increase in false positives.

Exam trap

Google Cloud often tests the misconception that manual data splitting or resampling is necessary for imbalanced data in AutoML, when in fact AutoML Tables provides built-in optimization objectives and weighted evaluation to handle imbalance without data manipulation.

How to eliminate wrong answers

Option A is wrong because AutoML Tables already performs stratified splitting by default; manually splitting does not change the model's training behavior or address the imbalance issue. Option B is wrong because it requires writing custom code (blending scripts), which violates the constraint of 'cannot write custom code' and is not a native AutoML Tables feature. Option D is wrong because under-sampling the majority class reduces the dataset size and discards valuable data, which can degrade model performance and is not recommended with AutoML Tables' built-in imbalance handling capabilities.

Full explanation →

71

MCQmedium

A company has deployed a model that predicts customer churn. The model's performance, as measured by AUC, has been declining over the past month. The team suspects data drift. They have enabled Vertex AI Model Monitoring, but no alerts have been triggered. What is a possible reason for the lack of alerts?

A.The monitoring is only sampling 10% of the serving data

B.The drift detection threshold is set too low

C.The model is being retrained daily

D.The drift detection focuses on categorical features only

AnswerA

Low sampling rates mean that Model Monitoring only examines a small fraction of predictions, potentially missing drift if it is not uniformly distributed.

Why this answer

If the sampling rate is low (e.g., 10% of serving data), Model Monitoring may not capture enough data to detect drift, leading to no alerts even if drift exists. A low threshold would create more alerts, not fewer. Daily retraining might correct drift, but would still likely trigger alerts if drift occurred between retraining runs.

Restricting to categorical features only would miss continuous feature drift, but that would still trigger alerts for categorical features.

Full explanation →

72

Multi-Selecthard

A team is deploying a model on Vertex AI Prediction. Which THREE configuration settings have a direct impact on both latency and cost? (Choose THREE.)

Select 3 answers

A.Size of training dataset

B.Minimum and maximum number of nodes (autoscaling)

C.Machine type (e.g., n1-standard-2)

D.Model architecture (e.g., number of layers)

E.Number of replicas in the endpoint

AnswersB, C, E

More nodes lower latency but increase cost.

Why this answer

Option B is correct because the minimum and maximum number of nodes in autoscaling directly control how many compute instances are provisioned to handle prediction requests. A higher minimum node count increases baseline cost and reduces cold-start latency, while a lower maximum can cause queuing and higher latency under load, directly impacting both metrics.

Exam trap

Google Cloud often tests the distinction between model-level properties (architecture, training data) and deployment-level configuration settings (machine type, replicas, autoscaling) to see if candidates confuse model development with serving infrastructure.

Full explanation →

73

Multi-Selecthard

You are monitoring a production model that is experiencing gradual decay in AUC. Which THREE metrics should you set up alerts for to diagnose the root cause? (Choose three.)

Select 3 answers

A.Concept drift score measured by comparing predicted vs actual outcomes.

B.Training-serving skew for categorical features with high importance.

C.Average prediction latency over the past hour.

D.Feature drift score for key numerical features.

E.Model staleness (days since last retraining).

AnswersA, B, D

Detects changes in relationship between features and labels.

Why this answer

Option A is correct because concept drift directly measures the degradation of model performance by comparing predicted probabilities against actual outcomes over time. A gradual AUC decay indicates that the relationship between features and the target is shifting, and tracking concept drift via metrics like the PSI or distribution of residuals helps isolate whether the model's predictive power is eroding due to changing data patterns.

Exam trap

Google Cloud often tests the distinction between metrics that indicate a symptom (e.g., latency, staleness) versus metrics that directly measure the cause of performance decay (drift scores), leading candidates to select operational metrics instead of diagnostic ones.

Full explanation →

74

Multi-Selectmedium

A data science team has trained a custom model using Vertex AI and wants to deploy it for online predictions with low latency. Which TWO actions should they take to optimize performance?

Select 2 answers

A.Use Vertex AI Endpoints with traffic splitting for canary deployments.

B.Enable autoscaling with a large min replicas count to handle bursts.

C.Optimize the model by quantizing to FP16.

D.Use a custom prediction routine with pre-processing inside the container.

E.Use a machine type with GPU for inference.

AnswersC, D

Quantization reduces model size and inference latency, often with minimal accuracy loss.

Why this answer

Option C is correct because quantizing the model to FP16 reduces its memory footprint and computational requirements, directly lowering inference latency on compatible hardware (e.g., NVIDIA GPUs with Tensor Cores). This optimization is especially effective for online predictions where response time is critical, as it accelerates matrix operations without significantly sacrificing model accuracy.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (e.g., autoscaling or GPU selection) is the primary way to optimize latency, when in fact model-level changes (quantization) and architectural changes (custom routines) are more direct and cost-effective.

Full explanation →

75

MCQmedium

Refer to the exhibit. An ML engineer in the team needs to deploy the model to an endpoint. The engineer is assigned the 'roles/aiplatform.user' role at the project level but still cannot deploy. What is the most likely reason?

A.The service account 'sa-training' is using all the model's quota.

B.Alice does not have any IAM role on the project.

C.Alice needs to be granted the 'roles/aiplatform.admin' role at the project level.

D.The model's resource-level IAM policy only grants the 'roles/aiplatform.user' role, which does not include deploy permission.

AnswerD

The resource policy overrides project-level roles and lacks deploy.

Why this answer

The 'roles/aiplatform.user' role at the project level grants permissions to use AI Platform resources, but it does not include the 'aiplatform.models.deploy' permission required to deploy a model to an endpoint. Model deployment is controlled by resource-level IAM policies, and if the model's resource-level policy only grants 'roles/aiplatform.user', the deploy action is denied. The correct role for deployment is 'roles/aiplatform.admin' or a custom role with the deploy permission.

Exam trap

Google Cloud often tests the distinction between project-level and resource-level IAM policies, where candidates assume that a project-level role automatically grants all permissions on child resources, ignoring that resource-level policies can be more restrictive.

How to eliminate wrong answers

Option A is wrong because quota usage by a service account does not affect IAM permissions; the error is about authorization, not resource limits. Option B is wrong because the question states the engineer is assigned 'roles/aiplatform.user' at the project level, so Alice does have an IAM role. Option C is wrong because while 'roles/aiplatform.admin' would grant deploy permission, the most likely reason for the failure is the model's resource-level IAM policy restricting deployment, not the project-level role.

Full explanation →

Page 1 of 7

All pages

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Scaling prototypes into ML models Automating and orchestrating ML pipelines Collaborating within and across teams to manage data and models Architecting low-code ML solutions Collaborating to manage data and models Serving and scaling models Monitoring ML solutions Solving business challenges with ML

See all domains with question counts →