Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 976–1000

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 14 of 14

976

MCQeasy

An ML engineer needs to monitor the error rate of prediction jobs on a Vertex AI Endpoint. Where can they view the number of failed prediction requests over time?

A.Cloud Monitoring

B.Cloud Console endpoint details page

C.Vertex AI Experiments

D.Cloud Logging

AnswerA

Correct: Cloud Monitoring provides metrics and alerts for endpoint predictions.

Why this answer

Vertex AI Endpoint metrics are integrated with Cloud Monitoring. Specific metrics like 'predictions/failed_count' can be viewed in Cloud Monitoring dashboards.

Full explanation →

977

MCQmedium

Your PyTorch training script uses DistributedDataParallel (DDP) across 4 vertices each with 4 GPUs (16 GPUs total). You submit a Vertex AI custom training job. How should you configure the worker pool spec?

A.Create one worker pool with 4 replicas, each with machine type having 4 GPUs

B.Create a chief worker pool with 1 replica (4 GPUs) and a parameter server pool with 4 replicas (no GPUs)

C.Create 4 separate jobs, each with 1 replica and 4 GPUs

D.Create one worker pool with 16 replicas, each with 1 GPU

AnswerA

This matches the requirement: 4 workers, each with 4 GPUs.

Why this answer

For DDP across multiple machines, use MultiWorkerMirroredStrategy equivalent in PyTorch: set replicas to 4, each with machine type having 4 GPUs. The TF_CONFIG env var is not needed; Vertex AI sets necessary environment variables for distributed training.

Full explanation →

978

MCQeasy

An ML team is moving from a prototype Jupyter notebook to a production training pipeline. They want to ensure reproducibility. Which approach should they take?

A.Use interactive parameter tuning.

B.Use a container with fixed dependencies and record hyperparameters.

C.Export the notebook's output model directly.

D.Save the notebook as a .py file.

AnswerB

Captures environment and configuration for reproducibility.

Why this answer

Option C is correct because using a container with fixed dependencies and recording hyperparameters ensures that the training environment and configuration are captured, enabling exact reproduction. Option A is wrong because a .py file does not capture the full environment. Option B is wrong because exporting the notebook's output model directly lacks environment tracking.

Option D is wrong because interactive tuning is not reproducible.

Full explanation →

979

Multi-Selecthard

You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)

Select 3 answers

A.Use BigQuery scheduled queries to run the training script on a schedule.

B.Use Vertex AI Pipelines to define the ML pipeline as a Directed Acyclic Graph (DAG) of components.

C.Use AI Platform Notebooks to schedule the training job on a recurring basis.

D.Use Cloud Build and Cloud Functions to trigger the pipeline when new training data arrives in Cloud Storage.

E.Use Cloud Composer to orchestrate the pipeline steps, including data extraction, preprocessing, training, and deployment.

AnswersB, D, E

Vertex AI Pipelines is purpose-built for ML pipelines.

Why this answer

Vertex AI Pipelines (option B) is correct because it provides a managed, serverless orchestration service for building, testing, and deploying ML pipelines as Directed Acyclic Graphs (DAGs). This directly supports the requirement for automated, minimal-intervention weekly retraining by allowing you to define reusable components and schedule pipeline runs via Cloud Scheduler or event triggers, integrating natively with TensorFlow and Google Cloud services.

Exam trap

The trap here is confusing development tools (like Notebooks) or data-query services (like BigQuery scheduled queries) with production-grade orchestration services, leading candidates to select options that cannot handle multi-step pipeline dependencies or automated scheduling in a managed, scalable way.

Full explanation →

980

MCQhard

Refer to the exhibit. A user is trying to upload a Vertex AI pipeline definition. The error indicates an invalid dependency order. What should the user do to fix this?

A.Reorder the tasks in the YAML so that task1 is defined before task2.

B.Rename task1 to a name that comes alphabetically before task2.

C.Change the dependency of task2 to be independent of task1.

D.Remove the dependentTasks field from task2 and rely on implicit ordering.

AnswerA

YAML ordering determines execution order when dependencies are declared.

Why this answer

Option A is correct because Vertex AI pipeline definitions require that tasks be declared in the order they appear in the dependency graph. The YAML parser validates the `dependentTasks` field by checking that referenced tasks are already defined. Defining `task1` before `task2` ensures that when `task2` declares a dependency on `task1`, `task1` is already in scope, resolving the invalid dependency order error.

Exam trap

Google Cloud often tests the misconception that alphabetical naming or implicit ordering can resolve dependency declaration errors, when in fact the YAML parser strictly requires tasks to be defined in topological order.

How to eliminate wrong answers

Option B is wrong because renaming tasks alphabetically does not affect the order of definition in the YAML file; Vertex AI pipelines rely on the sequence of task declarations, not lexical ordering of names. Option C is wrong because removing the dependency between task2 and task1 would change the pipeline logic, potentially breaking the intended workflow, and the error is about declaration order, not about whether the dependency is valid. Option D is wrong because implicit ordering is not supported in Vertex AI pipelines; the `dependentTasks` field is required to explicitly define dependencies, and removing it would cause the pipeline to run tasks in an undefined order, likely leading to runtime failures.

Full explanation →

981

Multi-Selectmedium

A company wants to deploy a model for real-time inference with high availability across multiple Google Cloud regions. The model is small and stateless. Which two steps should they take? (Choose two.)

Select 2 answers

A.Deploy the model to Vertex AI Prediction endpoints in multiple regions and use a global external HTTP(S) load balancer to route traffic to the nearest region.

B.Use Cloud Run with multi-region deployment and a global HTTP(S) load balancer.

C.Use Cloud Functions with a global HTTP(S) load balancer.

D.Use a single Vertex AI Prediction endpoint with multiple replicas across zones in the same region.

E.Deploy the model to a Vertex AI Prediction endpoint in a single region and use a global external HTTP(S) load balancer.

AnswersA, B

Multi-region endpoints with global load balancer provide HA and low latency.

Why this answer

Options B and C are correct. B deploys the model to Vertex AI Prediction endpoints in multiple regions behind a global load balancer, providing regional failover. C uses Cloud Run with multi-region deployment and a global load balancer, which also offers multi-region HA.

Option A is insufficient as a single region does not survive a regional outage. Option D is wrong because Cloud Functions is region-specific and not designed for latency-sensitive inference across regions. Option E is wrong because a single region does not provide cross-region HA.

Full explanation →

982

MCQeasy

A team wants to ensure that only approved models are deployed to production. Which Vertex AI feature should they use?

A.Vertex AI Experiments.

B.Cloud DLP.

C.Vertex AI Pipelines.

D.Vertex AI Feature Store.

E.Vertex AI Model Registry with versioning and alias.

AnswerE

Model Registry provides version control and alias-based deployment gates.

Why this answer

Vertex AI Model Registry with versioning and alias (Option E) is the correct feature because it allows teams to manage model lifecycle, track approved versions, and assign aliases (e.g., 'champion' or 'production') to designate which model is approved for deployment. This ensures only vetted models are promoted to production, aligning with governance and compliance requirements.

Exam trap

Google Cloud often tests the distinction between model tracking (Experiments) and model governance (Registry), so the trap here is assuming that any 'management' feature (like Pipelines or Experiments) can enforce deployment approvals, when only the Registry with aliases provides explicit version control and approval semantics.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing ML training runs, not for managing model deployment approvals. Option B is wrong because Cloud DLP (Data Loss Prevention) is a service for inspecting and masking sensitive data, not for model governance or deployment control. Option C is wrong because Vertex AI Pipelines orchestrates ML workflows (e.g., training, evaluation) but does not inherently enforce approval gates for production deployment.

Option D is wrong because Vertex AI Feature Store is used for storing, serving, and sharing feature data, not for model versioning or deployment approval.

Full explanation →

983

MCQmedium

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

A.Store and serve features using Vertex AI Feature Store with point-in-time correctness

B.Implement feature engineering in Vertex AI Pipelines to ensure temporal ordering

C.Store all features in BigQuery and join on timestamp during training and serving

D.Use Vertex AI AutoML instead of custom training

AnswerA

Feature Store provides consistent feature values for each timestamp, preventing leakage.

Why this answer

Option A is correct because Vertex AI Feature Store with point-in-time correctness ensures that for each training example, only feature values that were known at the time of the prediction (i.e., before the label occurred) are used. This prevents future data from leaking into the training set, which is the most common cause of poor generalization when temporal ordering matters. The Feature Store automatically retrieves the latest feature value as of a specified timestamp, eliminating the need for manual joins and windowing logic.

Exam trap

Google Cloud often tests the misconception that simply using a pipeline or a data warehouse with timestamps is sufficient to prevent leakage, but the key is the automated enforcement of point-in-time correctness, which only a dedicated feature store with time-travel capabilities provides.

How to eliminate wrong answers

Option B is wrong because implementing feature engineering in Vertex AI Pipelines ensures reproducible workflows but does not inherently enforce temporal ordering or prevent data leakage; pipelines can still join future features if the data is not time-aware. Option C is wrong because storing all features in BigQuery and joining on timestamp during training and serving is a manual approach that is error-prone and does not guarantee point-in-time correctness; it requires careful windowing logic and can still leak future data if the join is not correctly scoped. Option D is wrong because using Vertex AI AutoML does not automatically solve data leakage; AutoML models are equally susceptible to leakage if the training data contains future information, and the user still needs to ensure temporal integrity of the input features.

Full explanation →

984

MCQhard

A team wants to implement automated model documentation that captures training data, feature importance, evaluation metrics, and intended use. Which Vertex AI feature supports this?

A.Vertex AI Model Registry with model cards

B.Vertex AI Metadata

C.Vertex AI Explainable AI

D.Vertex AI Pipelines

AnswerA

Model cards are designed for automated documentation.

Why this answer

Model cards in Vertex AI Model Registry provide a standardised template for documenting model details.

Full explanation →

985

MCQmedium

A model deployed on Vertex AI Prediction repeatedly exits with code 137. What is the most likely cause?

A.The model has a disk I/O bottleneck.

B.The model is using too much CPU.

C.The container image is incompatible with the machine type.

D.The model is using more memory than allocated (4GB).

AnswerD

Memory limit reached, OOM killer terminates process.

Why this answer

Exit code 137 indicates that the container was killed by the Linux kernel's Out-Of-Memory (OOM) killer. In Vertex AI Prediction, each model deployment has a fixed memory allocation (default 4GB for custom containers). When the model's inference process exceeds this limit, the OOM killer terminates the container, resulting in exit code 137.

This is the most direct and common cause for this specific exit code in Vertex AI.

Exam trap

Google Cloud often tests the distinction between exit codes: candidates may confuse exit code 137 (OOM kill) with exit code 1 (generic error) or exit code 139 (segmentation fault), leading them to incorrectly attribute the issue to CPU or disk problems.

How to eliminate wrong answers

Option A is wrong because disk I/O bottlenecks typically cause slow performance or timeouts, not exit code 137 (SIGKILL from OOM). Option B is wrong because high CPU usage may cause throttling or latency, but does not trigger the OOM killer; exit code 137 is specifically memory-related. Option C is wrong because an incompatible container image would result in a different error, such as a crash loop with exit code 1 or 139 (segfault), not the OOM-specific exit code 137.

Full explanation →

986

MCQmedium

A data scientist trained a model on a single GPU but needs to train on multiple GPUs for a larger dataset. They observe that training time does not decrease linearly with additional GPUs. Which common issue is most likely?

A.Overfitting.

B.Model architecture too simple.

C.Learning rate too high.

D.Data pipeline bottleneck.

AnswerD

I/O or preprocessing bottleneck limits GPU utilization.

Why this answer

Option A is correct because a data pipeline bottleneck can starve GPUs, preventing linear speedup. Option B is wrong because overfitting relates to model performance, not training speed. Option C is wrong because learning rate affects convergence, not parallelism efficiency.

Option D is wrong because model architecture size does not directly cause non-linear speedup.

Full explanation →

987

MCQmedium

A team is using Cloud Composer to orchestrate ML workflows. They have a DAG that triggers a Vertex AI Training job, then a prediction deployment. The deployment step occasionally fails due to quota limits. What is the best way to handle this?

A.Increase the quota manually

B.Use Vertex AI Pipelines instead of Cloud Composer

C.Create a custom sensor to wait for quota to be available

D.Catch the exception in the DAG and send an alert

E.Implement exponential backoff retry in the DAG task

AnswerE

Retries with backoff handle transient failures.

Why this answer

Option E is correct because Cloud Composer (Apache Airflow) provides built-in retry mechanisms via task parameters like `retries` and `retry_delay`. Implementing exponential backoff in the DAG task is the best practice for handling transient quota errors, as it automatically retries the deployment step with increasing delays, reducing load on the quota system and increasing the chance of success without manual intervention. This approach aligns with Airflow's native error-handling capabilities and avoids unnecessary complexity or resource waste.

Exam trap

The trap here is that candidates often confuse manual quota increases or switching tools as the primary solution, when the exam expects knowledge of Airflow's native retry mechanisms and the principle of handling transient errors automatically within the orchestration layer.

How to eliminate wrong answers

Option A is wrong because manually increasing quota is a reactive, non-scalable solution that does not address transient quota limits and may incur additional costs or require approval processes. Option B is wrong because switching to Vertex AI Pipelines does not inherently solve quota limit issues; it changes the orchestration tool but still relies on the same underlying Vertex AI services and quota constraints. Option C is wrong because creating a custom sensor to wait for quota availability is overly complex, introduces polling overhead, and does not leverage Airflow's built-in retry mechanisms; sensors are better suited for waiting on external conditions like file arrival, not for handling transient API errors.

Option D is wrong because catching the exception and sending an alert only notifies the team of failure without automatically recovering the task, leading to manual intervention and potential delays; it does not handle the transient nature of quota errors.

Full explanation →

988

MCQmedium

A team deploys a model using Vertex AI Endpoint with automatic scaling. They observe that during traffic spikes, new instances take a long time to become ready, causing high latency for some requests. What should they configure to reduce this startup time?

A.Increase the max replicas

B.Use a custom container with a smaller footprint

C.Enable predictive autoscaling

D.Set a higher target CPU utilization

AnswerB

Smaller containers pull and initialize faster, reducing the time to become ready.

Why this answer

Option D is correct because using a custom container with a smaller footprint (e.g., smaller base image, fewer dependencies) reduces the time to pull and initialize the container. Option A increases max replicas but does not speed up startup. Option B may help trigger scaling earlier but startup time remains.

Option C is not a standard setting.

Full explanation →

989

MCQmedium

An engineer needs to perform sentiment analysis on customer reviews. They have a large volume of text and need a solution that requires minimal customisation. Which option is most efficient?

A.Use Vertex AI Prediction with a pre-trained model

B.Use BigQuery ML with LOGISTIC_REG

C.Train a custom model using AutoML NLP

D.Use the Natural Language API

AnswerD

Why this answer

The Natural Language API provides pre-built sentiment analysis with minimal setup. AutoML NLP would require custom training, BigQuery ML is for tabular data, and Vertex AI Prediction needs a deployed model.

Full explanation →

990

MCQeasy

A retail company wants to predict customer churn using their transaction history and customer demographics. They have limited ML expertise and want to use a managed service on Google Cloud. Which service should they use?

A.AI Platform Notebooks

B.Vertex AI AutoML (Tables)

C.Cloud TPU

D.BigQuery ML

AnswerB

AutoML Tables provides end-to-end automated model building for tabular data, ideal for limited ML expertise.

Why this answer

Vertex AI AutoML (Tables) is the correct choice because it is a managed service specifically designed for tabular data, requiring no ML expertise. It automates model training, hyperparameter tuning, and deployment for classification tasks like churn prediction, directly handling transaction history and demographic features.

Exam trap

The trap here is that candidates may confuse BigQuery ML as a fully managed no-code solution, but it still requires SQL proficiency and manual model selection, whereas Vertex AI AutoML is the true zero-code managed service for tabular data.

How to eliminate wrong answers

Option A is wrong because AI Platform Notebooks provides a Jupyter-based development environment for custom ML coding, not a managed no-code solution, and requires ML expertise to build and train models. Option C is wrong because Cloud TPU is a hardware accelerator for training large deep learning models (e.g., NLP or vision), not a managed service for tabular churn prediction, and is overkill for this use case. Option D is wrong because BigQuery ML enables SQL-based model creation directly in BigQuery, but it requires some ML knowledge to write queries and tune models, and is less automated than AutoML for users with limited ML expertise.

Full explanation →

991

MCQmedium

A machine learning engineer is preparing to train a Transformer-based model using TensorFlow on a single TPU v3-8 pod slice. The training script uses tf.distribute.TPUStrategy. Which environment variable must be set in Vertex AI to enable TPU training with the appropriate topology?

A.TPU_NAME

B.XRT_TPU_CONFIG

C.TPU_CONFIG

D.TF_CONFIG

AnswerC

Vertex AI automatically populates TPU_CONFIG with the TPU worker endpoint; the training script can parse it.

Why this answer

Vertex AI automatically sets the TPU_CONFIG environment variable to communicate the TPU worker IP address and port to the training container. TF_CONFIG is used for distributed training with CPUs/GPUs, but TPU_CONFIG is the correct one for TPU training.

Full explanation →

992

Multi-Selectmedium

A financial services company has deployed a credit risk ML model on Vertex AI. They want to monitor the model for fairness across demographic groups to ensure no biased outcomes. Which TWO actions should they take as best practices? (Choose TWO.)

Select 2 answers

A.Eliminate all features that are correlated with protected attributes from the model input to ensure fairness.

B.Use Vertex Explainable AI to understand feature attributions and compare their distributions across demographic groups.

C.Periodically compare the model's performance metrics (e.g., AUC) on the overall population versus the holdout test set.

D.Store all model predictions in BigQuery but do not capture ground truth labels to avoid privacy issues.

E.Set up alerts on the Vertex AI Model Monitoring fairness metrics, such as equal opportunity difference, and configure a slack channel for notifications.

AnswersB, E

Feature attribution analysis helps identify if the model relies disproportionately on sensitive attributes.

Why this answer

Option B is correct because Vertex Explainable AI provides feature attribution scores that can be compared across demographic groups to detect if the model relies on sensitive attributes or proxies. This enables fairness auditing by revealing whether the model's decision logic differs systematically for protected groups, which is a best practice for monitoring bias.

Exam trap

Google Cloud often tests the misconception that removing protected attributes or correlated features is sufficient for fairness, when in reality proxy features and complex interactions can still cause bias, making monitoring with explainability and fairness metrics essential.

Full explanation →

993

MCQmedium

A company uses BigQuery to store feature data for ML training. A data engineer notices that a Vertex AI Training job is failing with 'Access Denied' errors when reading from a BigQuery table. The training job uses a custom service account that has been granted the 'bigquery.dataViewer' role on the dataset. What is the most likely cause of the failure?

A.The service account is not in the same project as the BigQuery dataset.

B.The BigQuery table is partitioned and requires row-level access.

C.The service account lacks the 'bigquery.jobs.create' permission in the project.

D.The training job does not have the required network access to BigQuery.

AnswerC

Reading from BigQuery via Vertex AI Training requires the ability to submit a query job, which requires 'bigquery.jobs.create'.

Why this answer

The 'bigquery.dataViewer' role grants permissions to read BigQuery data (e.g., bigquery.tables.getData), but it does not include the 'bigquery.jobs.create' permission. When a Vertex AI training job reads from BigQuery, it must first create a BigQuery job (a query job) to retrieve the data. Without 'bigquery.jobs.create' at the project level, the service account cannot initiate the read operation, resulting in an 'Access Denied' error even though it has data-level access.

Exam trap

The trap here is that candidates often assume 'bigquery.dataViewer' is sufficient for all read operations, overlooking the requirement for 'bigquery.jobs.create' to initiate the query job that actually reads the data.

How to eliminate wrong answers

Option A is wrong because the service account does not need to be in the same project as the BigQuery dataset; cross-project access is supported as long as IAM permissions are granted at the dataset or table level. Option B is wrong because partitioned tables do not require row-level access by default; row-level access is controlled via BigQuery row-level security policies, which are not automatically required for partitioned tables. Option D is wrong because Vertex AI training jobs run within Google Cloud's internal network and have built-in access to BigQuery via the Cloud API; network access is not a common cause of 'Access Denied' errors for BigQuery reads.

Full explanation →

994

Multi-Selecthard

Which TWO statements are true about canary deployments for Vertex AI endpoints?

Select 2 answers

A.Canary deployments are only supported for custom containers, not prebuilt frameworks.

B.You can roll back a canary by resetting traffic to 0% for the new version.

C.You can use traffic splitting to gradually shift 1-100% of traffic to a new version.

D.Canary deployments require the use of Vertex AI Model Registry.

E.Once a canary receives 50% traffic, you cannot increase it further.

AnswersB, C

Traffic can be shifted back to old version easily.

Why this answer

Traffic splitting is supported for gradual rollout; you cannot increase split after max traffic limit (though you can adjust). Canary can help test before full rollout; monitoring metrics can be used for automated rollback.

Full explanation →

995

MCQeasy

A data scientist wants to train a TensorFlow model on Vertex AI using a pre-built container. Which of the following pre-built containers is NOT available for custom training in Vertex AI?

A.TensorFlow

B.PyTorch

C.Apache Spark

D.scikit-learn

AnswerC

Spark is not a pre-built container; use custom container or Dataproc.

Why this answer

Vertex AI provides pre-built containers for TensorFlow, PyTorch, scikit-learn, and XGBoost. There is no pre-built container for Apache Spark in the Vertex AI training service; Spark jobs are typically run on Dataproc.

Full explanation →

996

MCQmedium

You are using TensorFlow Transform (tf.Transform) to preprocess data for a model that will be deployed on Vertex AI. What is the primary benefit of using tf.Transform over Dataflow alone?

A.Support for GPUs during preprocessing

B.Built-in feature store integration

C.Faster data processing

D.Training/serving skew prevention through a consistent transformation graph

AnswerD

tf.Transform saves a TensorFlow graph that can be used both in training and serving, avoiding skew.

Why this answer

tf.Transform computes statistics (e.g., min, max) on the full dataset, then generates a TensorFlow graph that applies the same transformation consistently at training and serving time. Dataflow alone does not ensure this consistency.

Full explanation →

997

MCQmedium

A financial services company deploys a fraud detection model on Vertex AI. The model must make predictions in under 100ms. After deployment, latency spikes to 300ms during peak hours. The model is a large ensemble with 500MB size. Which action is most likely to reduce latency?

A.Optimize the model using TensorFlow Lite and convert to a smaller format.

B.Switch to batch prediction to process requests asynchronously.

C.Reduce the machine type to a smaller instance.

D.Increase the number of replicas on the endpoint.

AnswerA

Reduces model size and inference time.

Why this answer

The primary cause of latency is the large model size (500MB) combined with real-time inference constraints. Optimizing the model with TensorFlow Lite reduces the model size and computational overhead, directly decreasing inference time. This addresses the root cause—model complexity—rather than scaling infrastructure around an inefficient model.

Exam trap

The trap here is that candidates often confuse scaling (replicas or instance size) with optimization, failing to recognize that model size and inference efficiency are the primary drivers of latency in real-time serving.

How to eliminate wrong answers

Option B is wrong because batch prediction processes requests asynchronously, which does not meet the sub-100ms real-time requirement; it is designed for offline, high-throughput scenarios, not low-latency serving. Option C is wrong because reducing the machine type to a smaller instance would decrease available CPU/memory, likely increasing latency further due to resource contention. Option D is wrong because increasing the number of replicas improves throughput and availability but does not reduce per-request latency; it may even add network overhead from load balancing.

Full explanation →

998

MCQmedium

An ML engineer wants to monitor the latency of online predictions from a Vertex AI Endpoint. They need to track p50, p95, and p99 latency over time and set up alerts if p99 exceeds 1 second. Which approach should they take?

A.Configure a Cloud Monitoring dashboard using the built-in Vertex AI Endpoint metrics for latency

B.Use Cloud Monitoring custom metrics and publish latency percentiles from the application code

C.Enable request/response logging to BigQuery and calculate latency percentiles using SQL queries

D.Use Vertex AI Model Monitoring to track prediction latency

AnswerA

Built-in metrics include latency percentiles; alerts can be set directly.

Why this answer

Cloud Monitoring can ingest metrics from Vertex AI Endpoints, including prediction latency distributions (p50, p95, p99). Alerts can be configured on these metrics.

Full explanation →

999

MCQeasy

A machine learning engineer wants to define a lightweight pipeline component that runs custom Python code without building a container image. Which KFP SDK feature should they use?

A.Importer component

B.Python function component with @dsl.component

C.Container component

D.Vertex AI Training job

AnswerB

This decorator turns a Python function into a pipeline component without requiring a custom container.

Why this answer

The `@dsl.component` decorator in KFP SDK allows you to define a lightweight Python function component that runs custom code without requiring a container image. It automatically generates a container specification from the function's dependencies, making it ideal for simple, non-containerized pipeline steps.

Exam trap

The trap here is that candidates may confuse 'lightweight' with 'no container at all,' but KFP always runs components in containers; the `@dsl.component` feature automates container creation, not eliminates it.

How to eliminate wrong answers

Option A is wrong because the Importer component is used to import existing artifacts (like datasets or models) into a pipeline, not to run custom Python code. Option C is wrong because a Container component requires you to specify a pre-built container image, which contradicts the requirement of not building a container image. Option D is wrong because Vertex AI Training job is a managed service for running training jobs on Vertex AI, not a lightweight KFP SDK feature for running custom code without containers.

Full explanation →

1000

MCQhard

You are the ML engineer for a financial services company. You have deployed a fraud detection model on Vertex AI Endpoints using a custom container. The model is a gradient boosting model trained on transactional data. Over the past week, the model's precision has dropped from 95% to 80%, while recall has remained stable. The input data volume and distribution have not changed significantly. The model is served on a single endpoint with autoscaling enabled (min replicas=2, max replicas=10). You notice that the average CPU utilization of the serving containers has increased from 40% to 90%, and the p99 latency has increased from 50ms to 200ms. The model is retrained weekly using the latest data, and the last retraining was 3 days ago. The logs show no errors, and the model version is unchanged. Given these symptoms, what is the most likely cause of the precision drop?

A.The autoscaling policy is not scaling up fast enough, causing increased latency and prediction errors.

B.The model is overfitting to recent transaction patterns due to weekly retraining.

C.A recent change in the preprocessing code in the container transformed features differently than what the model expects, causing incorrect predictions.

D.The model was replaced with a different version without updating the endpoint.

AnswerC

Feature transformation mismatch can cause precision drop without affecting recall.

Why this answer

Option C is correct because the precision drop without a change in input distribution or recall strongly indicates a systematic error in predictions, not a data shift. A preprocessing code change in the custom container would cause the model to receive features transformed differently than during training, leading to incorrect probability estimates. The increased CPU utilization and latency are consistent with the container performing additional or different preprocessing steps, not with autoscaling issues or model version changes.

Exam trap

The trap here is that candidates often attribute latency increases and precision drops to autoscaling or model drift, but the key clue is that recall remains stable, which points to a systematic prediction error (preprocessing mismatch) rather than a data distribution shift or infrastructure scaling problem.

How to eliminate wrong answers

Option A is wrong because autoscaling delays cause increased latency and potential timeouts, but they do not directly cause a precision drop; precision depends on the correctness of predictions, not on response time. Option B is wrong because overfitting to recent patterns would typically cause a drop in recall as well, not just precision, and the input distribution has not changed significantly. Option D is wrong because the logs show no errors and the model version is unchanged, so the endpoint is still serving the same model; a version replacement would require an explicit update and would likely trigger a deployment event.

Full explanation →

Page 14 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Scaling Prototypes into ML Models Collaborating to manage data and models Solving business challenges with ML

See all domains with question counts →

Google Professional Machine Learning Engineer PMLE Questions 976–1000 | Page 14/14 | Courseiva