Sample questions
Google Professional Machine Learning Engineer practice questions
A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?
Trap 1: Implement a feature engineering pipeline that compresses the input…
While potentially beneficial, this is a longer-term solution and does not provide immediate latency relief during the surge.
Trap 2: Deploy a newer version of the model that uses a more efficient…
Deploying a new model requires time for development, testing, and approval, and may not be feasible for immediate mitigation.
Trap 3: Increase the number of TensorFlow Serving instances by reducing the…
Reducing CPU requests may lead to CPU starvation and pod instability, harming latency further.
- A
Implement a feature engineering pipeline that compresses the input features to reduce data size and inference time.
Why wrong: While potentially beneficial, this is a longer-term solution and does not provide immediate latency relief during the surge.
- B
Deploy a newer version of the model that uses a more efficient architecture to reduce computational complexity.
Why wrong: Deploying a new model requires time for development, testing, and approval, and may not be feasible for immediate mitigation.
- C
Increase the number of TensorFlow Serving instances by reducing the CPU request per pod in GKE to allow more pods per node.
Why wrong: Reducing CPU requests may lead to CPU starvation and pod instability, harming latency further.
- D
Add more nodes to the GKE cluster to increase the total CPU resources available for serving.
Adding nodes increases compute capacity, allowing more parallel inference and reducing latency under high load.
A global retail company uses Vertex AI Recommendations to provide product recommendations on their website. They have a large catalog and millions of users. The initial deployment works well for active users, but they notice that new users (with no purchase history) receive generic recommendations that are not personalized. The company wants to improve the cold-start experience. They have user demographic data (age, location) available at sign-up. Current recommendation model is a collaborative filtering model using the built-in Vertex AI Recommendations. What should the company do to improve personalization for new users?
Trap 1: Collect more historical interaction data before showing…
New users have no history; waiting does not help.
Trap 2: Disable recommendations for new users until they have at least 10…
This would lose the opportunity to engage new users.
Trap 3: Build a custom two-tower recommendation model using Vertex AI…
Building a custom model is more complex and may not be needed.
- A
Collect more historical interaction data before showing recommendations
Why wrong: New users have no history; waiting does not help.
- B
Disable recommendations for new users until they have at least 10 interactions
Why wrong: This would lose the opportunity to engage new users.
- C
Increase the user exploration parameter in the Vertex AI Recommendations configuration
Exploration helps serve diverse items to new users to learn preferences.
- D
Build a custom two-tower recommendation model using Vertex AI Training
Why wrong: Building a custom model is more complex and may not be needed.
Your team is developing a machine learning model for real-time fraud detection. The training pipeline runs on Vertex AI and uses BigQuery for feature engineering. Recently, the pipeline has been taking significantly longer to execute. Upon investigation, you find that the BigQuery query for feature extraction is being rerun every time the pipeline runs, even though the underlying data hasn't changed. The pipeline is scheduled to run every hour. You want to reduce cost and execution time without losing the ability to detect data drifts. Which approach should you take?
Trap 1: Implement a caching mechanism in the pipeline that stores the…
Pipeline caching is based on component inputs, not on data content, so it may not prevent rerun if inputs differ.
Trap 2: Reduce the pipeline frequency to once a day to minimize the number…
This reduces cost but delays model updates and data drift detection.
Trap 3: Use a conditional pipeline that checks if the data has changed…
This adds complexity and still requires executing the pipeline to perform the check.
- A
Implement a caching mechanism in the pipeline that stores the results of the BigQuery query and reuses them if the data hasn't changed.
Why wrong: Pipeline caching is based on component inputs, not on data content, so it may not prevent rerun if inputs differ.
- B
Move the feature extraction to a separate scheduled query in BigQuery and load the results into a table that the pipeline reads from.
This separates concerns and avoids redundant execution, while still allowing data drift detection via the pipeline.
- C
Reduce the pipeline frequency to once a day to minimize the number of runs.
Why wrong: This reduces cost but delays model updates and data drift detection.
- D
Use a conditional pipeline that checks if the data has changed before running the feature extraction step.
Why wrong: This adds complexity and still requires executing the pipeline to perform the check.
A healthcare organization is building a machine learning model to predict patient readmission risk. They have sensitive data stored in BigQuery that includes protected health information (PHI). The data science team uses Vertex AI Workbench notebooks to explore the data and develop models. The organization's security policy requires that all PHI data must be encrypted at rest and in transit, and that access to the data is logged and audited. They also need to ensure that the data used for model training is de-identified to remove direct identifiers such as patient names and SSNs. The team wants to automate the de-identification process as part of the data pipeline. Which approach meets these requirements?
Trap 1: Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC…
Shielded VM and VPC-SC provide security but do not de-identify data.
Trap 2: Use Cloud Key Management Service to encrypt the PHI columns in…
Encryption does not remove identifiers; the team would still see PHI after decryption.
Trap 3: Use BigQuery row-level security to mask PHI columns for the data…
Row-level security does not remove identifiers for training; it only masks at query time.
- A
Create a Dataflow pipeline that reads from the original BigQuery table, applies Cloud DLP de-identification transforms, and writes to a new BigQuery table. Grant the data science team access to the de-identified table.
Dataflow with DLP automates de-identification and creates a safe dataset.
- B
Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC to restrict data access.
Why wrong: Shielded VM and VPC-SC provide security but do not de-identify data.
- C
Use Cloud Key Management Service to encrypt the PHI columns in BigQuery, and share the encryption key with the data science team.
Why wrong: Encryption does not remove identifiers; the team would still see PHI after decryption.
- D
Use BigQuery row-level security to mask PHI columns for the data science team, and train the model directly on the original table.
Why wrong: Row-level security does not remove identifiers for training; it only masks at query time.
You are an ML engineer at a global e-commerce company. Your team has developed a deep learning model for product recommendation that runs on Vertex AI Prediction. The model is deployed on a single n1-highmem-2 instance (CPU only) with autoscaling enabled (min replicas=1, max replicas=10). During Black Friday, traffic spikes to 1000 requests per second (QPS), and you observe that latency increases from 50ms to over 5000ms, and many requests time out. You check the monitoring dashboard and see that CPU utilization is at 100% on the single instance, and autoscaling is not triggering quickly enough. The team has a budget for this service and wants to handle the spike without compromising latency. What should you do?
Trap 1: Increase min replicas to 5 to keep warm instances
Without improving per-instance throughput, warm instances may still be insufficient.
Trap 2: Set min replicas=1 and max replicas=5 to control cost
Limiting max replicas may not handle the spike.
Trap 3: Increase max replicas to 20 and keep CPU instances
CPU instances have high latency per request; more replicas may not reduce latency enough.
- A
Switch to GPU instances (e.g., n1-standard-4 with T4) and set min replicas=2 with autoscaling up to 10
GPUs accelerate inference, reducing per-request latency; warm instances handle spike.
- B
Increase min replicas to 5 to keep warm instances
Why wrong: Without improving per-instance throughput, warm instances may still be insufficient.
- C
Set min replicas=1 and max replicas=5 to control cost
Why wrong: Limiting max replicas may not handle the spike.
- D
Increase max replicas to 20 and keep CPU instances
Why wrong: CPU instances have high latency per request; more replicas may not reduce latency enough.
A financial services company uses Vertex AI AutoML Tables to build a credit risk model. The dataset contains 500,000 rows and 50 features, including loan amount, credit score, debt-to-income ratio, and employment length. The target variable is binary: 'default' (1) or 'no default' (0). The data is highly imbalanced, with only 2% defaults. The data scientist trains a model with AutoML Tables using default settings. The evaluation metrics show an AUC of 0.85, but the confusion matrix reveals that the model predicts 'no default' for almost all cases, missing most defaults. The data scientist needs to improve the model's ability to identify defaults without significantly increasing false positives. They have limited time and cannot write custom code. What should they do?
Trap 1: Manually split the data into a stratified train/test set to ensure…
Manually splitting the data is unnecessary because AutoML Tables automatically handles data splitting. The issue is class imbalance during training, not the split.
Trap 2: Train multiple models with different algorithms (e.g., XGBoost,…
Blending models with a custom script requires writing code, which is not allowed due to time constraints. AutoML Tables already uses ensemble methods internally.
Trap 3: Under-sample the majority class to create a balanced dataset and…
Under-sampling may discard valuable data and is not recommended. AutoML Tables provides built-in mechanisms like weighted evaluation to handle imbalance effectively.
- A
Manually split the data into a stratified train/test set to ensure the same proportion of defaults in each.
Why wrong: Manually splitting the data is unnecessary because AutoML Tables automatically handles data splitting. The issue is class imbalance during training, not the split.
- B
Train multiple models with different algorithms (e.g., XGBoost, Random Forest) and blend them using a custom script.
Why wrong: Blending models with a custom script requires writing code, which is not allowed due to time constraints. AutoML Tables already uses ensemble methods internally.
- C
Enable 'Enable weighted evaluation' and set the optimization objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5.
This is correct because it uses AutoML Tables' built-in weighted evaluation and custom optimization objective to focus on recall for the minority class, without needing custom code.
- D
Under-sample the majority class to create a balanced dataset and retrain.
Why wrong: Under-sampling may discard valuable data and is not recommended. AutoML Tables provides built-in mechanisms like weighted evaluation to handle imbalance effectively.
A financial services firm deploys a binary classification model for fraud detection. The model's precision is 0.95 and recall is 0.60 on the test set. After deployment, the fraud rate in production is 0.5% compared to 5% in the test set. The model shows good calibration on the test set (Brier score 0.02) but poor calibration in production (Brier score 0.15). What is the most likely explanation for the calibration degradation?
Trap 1: The distribution of input features has shifted significantly,…
Feature drift can cause poor performance, but the problem statement does not mention feature drift; calibration degradation is specifically addressed.
Trap 2: The model overfits to noise in the training data, leading to poor…
Overfitting would show poor test set performance, but the test set had good Brier score.
Trap 3: The production data has a different class imbalance than the…
Class imbalance alone does not explain miscalibration; the model's probability estimates can still be calibrated if the imbalance is accounted for.
- A
The distribution of input features has shifted significantly, causing the model to produce incorrect probabilities.
Why wrong: Feature drift can cause poor performance, but the problem statement does not mention feature drift; calibration degradation is specifically addressed.
- B
The model overfits to noise in the training data, leading to poor generalization.
Why wrong: Overfitting would show poor test set performance, but the test set had good Brier score.
- C
The production data has a different class imbalance than the training data, causing the model to be biased toward the majority class.
Why wrong: Class imbalance alone does not explain miscalibration; the model's probability estimates can still be calibrated if the imbalance is accounted for.
- D
The relationship between features and the target has changed (concept drift), causing the model's probability estimates to be misaligned with the true probabilities.
Concept drift changes the conditional distribution P(Y|X), which directly affects calibration.
You are using Vertex AI Matching Engine for similarity search. Your index has 10 million embeddings of 512 dimensions. The query latency requirement is under 10ms for 99th percentile. Which index type should you choose?
Trap 1: Brute-force index with cosine distance.
Brute-force is exact but extremely slow for 10M vectors; cannot meet 10ms latency.
Trap 2: A custom distance-based index using Cloud SQL.
Cloud SQL cannot efficiently handle vector search at this scale.
Trap 3: A tree-based index from scikit-learn deployed as a custom container.
Tree-based indices may not scale to 10M points and are not optimized for low latency.
- A
Brute-force index with cosine distance.
Why wrong: Brute-force is exact but extremely slow for 10M vectors; cannot meet 10ms latency.
- B
Approximate Nearest Neighbor (ANN) index using the ScaNN algorithm.
ANN with ScaNN is designed for low-latency, high-scale similarity search.
- C
A custom distance-based index using Cloud SQL.
Why wrong: Cloud SQL cannot efficiently handle vector search at this scale.
- D
A tree-based index from scikit-learn deployed as a custom container.
Why wrong: Tree-based indices may not scale to 10M points and are not optimized for low latency.
A machine learning engineer wants to deploy a trained model to Vertex AI for online predictions. Which Vertex AI resource is required to serve the model and provide an endpoint URL?
Trap 1: Vertex AI Pipeline
Pipelines orchestrate ML workflows, not serving.
Trap 2: Vertex AI Model Registry
Model Registry stores and versions models, but does not provide an endpoint.
Trap 3: Vertex AI Feature Store
Feature Store is for managing and serving features, not for deploying models.
- A
Vertex AI Pipeline
Why wrong: Pipelines orchestrate ML workflows, not serving.
- B
Vertex AI Model Registry
Why wrong: Model Registry stores and versions models, but does not provide an endpoint.
- C
Vertex AI Feature Store
Why wrong: Feature Store is for managing and serving features, not for deploying models.
- D
Vertex AI Endpoint
Correct. An endpoint is required to deploy a model and obtain a URL for online predictions.
You have a Vertex AI endpoint serving a model for real-time predictions. The endpoint is configured with minReplicaCount=2 and maxReplicaCount=10. Over the past week, you notice that the actual number of replicas rarely exceeds 2, but the average CPU utilization is around 85%. You want to reduce costs without impacting performance. What should you do?
Trap 1: Increase minReplicaCount to 5.
This would increase baseline cost.
Trap 2: Increase maxReplicaCount to 20.
Increasing max will not reduce costs; it could increase them.
Trap 3: Decrease the CPU utilization target to 50%
This would cause the autoscaler to add more replicas sooner, increasing costs.
- A
Increase minReplicaCount to 5.
Why wrong: This would increase baseline cost.
- B
Decrease minReplicaCount to 1.
Since the number of replicas rarely exceeds 2, lowering min to 1 reduces the baseline cost, and the autoscaler can still scale up if needed.
- C
Increase maxReplicaCount to 20.
Why wrong: Increasing max will not reduce costs; it could increase them.
- D
Decrease the CPU utilization target to 50%
Why wrong: This would cause the autoscaler to add more replicas sooner, increasing costs.
Your company runs a high-traffic web application that serves the same machine learning model prediction for many identical requests (e.g., product recommendations for the same user profile). You want to reduce latency and load on the prediction endpoint by caching responses. Which Google Cloud service should you use?
Trap 1: Cloud CDN
Cloud CDN caches static content at edge locations, not dynamic API responses.
Trap 2: Cloud Spanner
Spanner is a relational database, not a cache; it is not designed for sub-millisecond caching.
Trap 3: BigQuery
BigQuery is an analytics warehouse, not suitable for real-time caching.
- A
Cloud CDN
Why wrong: Cloud CDN caches static content at edge locations, not dynamic API responses.
- B
Cloud Memorystore
Memorystore (Redis) provides low-latency caching for prediction responses.
- C
Cloud Spanner
Why wrong: Spanner is a relational database, not a cache; it is not designed for sub-millisecond caching.
- D
BigQuery
Why wrong: BigQuery is an analytics warehouse, not suitable for real-time caching.
You have a Vertex AI endpoint with two deployed models: a champion (v1) and a challenger (v2). You set the traffic split to 90% v1 and 10% v2. After a week, you observe that v2 has better business metrics. You want to shift all traffic to v2 gradually over 3 days to avoid any risk. What should you do?
Trap 1: Deploy v2 to a new endpoint and update your clients to use the new…
This does not allow gradual shifting; it's a cut-over.
Trap 2: Use Vertex AI Experiments to compare v1 and v2, then redeploy v2…
Experiments are for training, not serving traffic management.
Trap 3: Delete v1 from the endpoint so that all traffic automatically goes…
This is abrupt and risky.
- A
Deploy v2 to a new endpoint and update your clients to use the new endpoint.
Why wrong: This does not allow gradual shifting; it's a cut-over.
- B
Use Vertex AI Experiments to compare v1 and v2, then redeploy v2 with 100% traffic.
Why wrong: Experiments are for training, not serving traffic management.
- C
Update the traffic split configuration on the endpoint multiple times over the 3 days to gradually increase v2's percentage.
This is the correct method for gradual traffic shifting.
- D
Delete v1 from the endpoint so that all traffic automatically goes to v2.
Why wrong: This is abrupt and risky.
Your team has deployed a model on Vertex AI endpoints and you are planning an A/B test to compare a new challenger model (v2) against the current champion (v1). The test should measure business metrics such as click-through rate. Which THREE steps should you take to set up the A/B test correctly? (Choose 3 correct answers)
Trap 1: Create a new endpoint for v2 and gradually shift DNS traffic.
This does not use Vertex AI's built-in A/B testing capability and is more complex.
Trap 2: Use Vertex AI Experiments to compare model performance.
Experiments are for training, not serving A/B tests.
- A
Deploy the challenger model (v2) to the same endpoint as the champion (v1).
Both models must be on the same endpoint to use traffic splitting.
- B
Modify your application to log which model version served each prediction.
You need to correlate predictions with the model version to measure business metrics per version.
- C
Create a new endpoint for v2 and gradually shift DNS traffic.
Why wrong: This does not use Vertex AI's built-in A/B testing capability and is more complex.
- D
Use Vertex AI Experiments to compare model performance.
Why wrong: Experiments are for training, not serving A/B tests.
- E
Set up a traffic split between v1 and v2, e.g., 90% v1 and 10% v2.
Traffic splitting enables A/B testing by routing a percentage of requests to each version.
A company is deploying a complex model that requires GPU for inference. They want to use Vertex AI for serving. Which TWO steps are required to deploy the model with GPU support? (Choose 2)
Trap 1: Enable Vertex AI Model Optimization for automatic GPU compilation.
Model Optimization is optional; not a requirement for GPU deployment.
Trap 2: Increase the minimum replicas to at least 2 for GPU redundancy.
Replica count is independent of GPU support; not a requirement.
Trap 3: Use gRPC protocol for prediction requests to reduce latency.
gRPC is supported but not required for GPU deployment.
- A
Select a GPU-enabled machine type such as n1-standard-4 with 1 x NVIDIA Tesla T4.
GPU-enabled machine type is necessary for GPU inference.
- B
Enable Vertex AI Model Optimization for automatic GPU compilation.
Why wrong: Model Optimization is optional; not a requirement for GPU deployment.
- C
Deploy the model using a custom container that includes CUDA and cuDNN.
Custom container must have GPU drivers; Vertex AI pre-built containers for common frameworks may include them.
- D
Increase the minimum replicas to at least 2 for GPU redundancy.
Why wrong: Replica count is independent of GPU support; not a requirement.
- E
Use gRPC protocol for prediction requests to reduce latency.
Why wrong: gRPC is supported but not required for GPU deployment.
You need to deploy a model to a Vertex AI endpoint that can scale down to zero when there are no requests to minimize costs. Which feature should you enable?
Trap 1: Deploy the model to a Compute Engine instance and use instance…
Compute Engine does not natively scale to zero; you would need to manage instances manually.
Trap 2: Use a custom metric for autoscaling
Custom metrics can be used for scaling but do not enable scale-to-zero by themselves.
Trap 3: Set maxReplicaCount to 0
maxReplicaCount=0 would prevent any scaling.
- A
Deploy the model to a Compute Engine instance and use instance groups.
Why wrong: Compute Engine does not natively scale to zero; you would need to manage instances manually.
- B
Use a custom metric for autoscaling
Why wrong: Custom metrics can be used for scaling but do not enable scale-to-zero by themselves.
- C
Enable autoscaling with minReplicaCount=0
minReplicaCount=0 allows the endpoint to scale to zero when idle.
- D
Set maxReplicaCount to 0
Why wrong: maxReplicaCount=0 would prevent any scaling.
A company uses Vertex AI Vector Search (Matching Engine) for a product recommendation system. The product embeddings are updated hourly. Which index update method should they use to ensure low latency for new items?
Trap 1: Batch rebuild the index every hour
Batch rebuilds are time-consuming and cause downtime; not suitable for hourly updates.
Trap 2: Create a new index each hour and swap endpoints
Swapping endpoints is complex and may cause query interruption.
Trap 3: Use brute-force index to simplify updates
Brute-force index does not support streaming updates; also inefficient for large datasets.
- A
Batch rebuild the index every hour
Why wrong: Batch rebuilds are time-consuming and cause downtime; not suitable for hourly updates.
- B
Use streaming updates to add new embeddings incrementally
Correct: Streaming updates allow near-real-time ingestion of new vectors.
- C
Create a new index each hour and swap endpoints
Why wrong: Swapping endpoints is complex and may cause query interruption.
- D
Use brute-force index to simplify updates
Why wrong: Brute-force index does not support streaming updates; also inefficient for large datasets.
A company uses Vertex AI Vector Search for similarity search. They have a dataset of 10 million 512-dimensional vectors. Which index type should they choose for lowest latency at high recall?
Trap 1: Brute-force (flat) index
Brute-force is accurate but slower for large datasets.
Trap 2: Tree-based index
Tree-based not supported in Vertex AI Vector Search.
Trap 3: Hashing-based index
Not a standard index type in Vector Search.
- A
Brute-force (flat) index
Why wrong: Brute-force is accurate but slower for large datasets.
- B
Approximate nearest neighbor (ANN) index with Scann
ANN is designed for large-scale, low-latency search with high recall.
- C
Tree-based index
Why wrong: Tree-based not supported in Vertex AI Vector Search.
- D
Hashing-based index
Why wrong: Not a standard index type in Vector Search.
A logistics company uses Vertex AI AutoML Tables to predict delivery delays based on order attributes, weather data, and traffic data. The model is retrained weekly using a Vertex AI Pipeline that runs a BigQuery query to get training data, then triggers AutoML training. Recently, the pipeline fails with the error 'Dataset not found' when the AutoML training step starts. The BigQuery query runs successfully and outputs a table. Which is the most likely cause?
Trap 1: The AutoML training step is referencing a different dataset…
Possible but less likely; the error points to missing import step.
Trap 2: The training data has been manually deleted from Cloud Storage.
The error is 'Dataset not found', not data missing.
Trap 3: The pipeline's IAM permissions are insufficient to access BigQuery.
The BigQuery query succeeded, so permissions are fine.
- A
The AutoML training step is referencing a different dataset location.
Why wrong: Possible but less likely; the error points to missing import step.
- B
The training data has been manually deleted from Cloud Storage.
Why wrong: The error is 'Dataset not found', not data missing.
- C
The pipeline's IAM permissions are insufficient to access BigQuery.
Why wrong: The BigQuery query succeeded, so permissions are fine.
- D
The BigQuery output table is not being passed as a Vertex AI Dataset resource.
The pipeline must create a Vertex AI Dataset from the BigQuery table for AutoML to use.
A data analyst wants to build a binary classification model to predict customer churn using SQL queries in BigQuery. Which BigQuery ML model type should they use?
A financial services company has deployed a classification model on Vertex AI to detect fraudulent transactions. The model is monitored using Vertex AI Model Monitoring for skew and drift detection, and also logs predictions to BigQuery for analysis. After a month, the monitoring alerts show a significant drift in one feature (transaction_amount). Which TWO actions should the team take to diagnose and address this issue?
Trap 1: Increase the frequency of model monitoring checks to every hour.
More frequent monitoring does not address the cause of drift.
Trap 2: Increase the sampling rate for prediction logging to ensure full…
While helpful for analysis, it's not a direct corrective action for drift.
Trap 3: Reduce the alert threshold to minimize false positives.
This would suppress legitimate alerts and not solve the drift issue.
- A
Compare the feature distribution in the training data with the recent serving data using statistical tests.
This diagnostic step helps understand the nature and extent of the drift.
- B
Retrain the model on the most recent data to incorporate the new distribution.
If drift is due to a real shift, retraining with recent data can improve performance.
- C
Increase the frequency of model monitoring checks to every hour.
Why wrong: More frequent monitoring does not address the cause of drift.
- D
Increase the sampling rate for prediction logging to ensure full data capture.
Why wrong: While helpful for analysis, it's not a direct corrective action for drift.
- E
Reduce the alert threshold to minimize false positives.
Why wrong: This would suppress legitimate alerts and not solve the drift issue.
A company is deploying a new model version to an existing Vertex AI endpoint. They want to test the new version with 5% of traffic before fully rolling it out. What is the correct approach?
Trap 1: Create a new endpoint for the new version and update the client to…
This would require client-side changes and does not leverage Vertex AI's built-in traffic splitting.
Trap 2: Deploy the new version and set the minimum replicas to 0, then…
Min replicas control scaling, not traffic allocation. This does not split traffic.
Trap 3: Use Cloud Load Balancing to distribute traffic between two…
Cloud Load Balancing is not designed for Vertex AI endpoint traffic splitting; Vertex AI has built-in traffic management.
- A
Create a new endpoint for the new version and update the client to call both endpoints.
Why wrong: This would require client-side changes and does not leverage Vertex AI's built-in traffic splitting.
- B
Deploy the new version and set the minimum replicas to 0, then gradually increase.
Why wrong: Min replicas control scaling, not traffic allocation. This does not split traffic.
- C
Use Cloud Load Balancing to distribute traffic between two endpoints.
Why wrong: Cloud Load Balancing is not designed for Vertex AI endpoint traffic splitting; Vertex AI has built-in traffic management.
- D
Deploy the new version as a separate model on the same endpoint and use the `traffic_split` parameter in the deployment request.
Correct: Vertex AI allows multiple deployed models on one endpoint with traffic split percentages.
You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)
Trap 1: Use BigQuery scheduled queries to run the training script on a…
BigQuery scheduled queries are for SQL queries, not running ML training jobs.
Trap 2: Use AI Platform Notebooks to schedule the training job on a…
Notebooks are for interactive development, not scheduling production pipelines.
- A
Use BigQuery scheduled queries to run the training script on a schedule.
Why wrong: BigQuery scheduled queries are for SQL queries, not running ML training jobs.
- B
Use Vertex AI Pipelines to define the ML pipeline as a Directed Acyclic Graph (DAG) of components.
Vertex AI Pipelines is purpose-built for ML pipelines.
- C
Use AI Platform Notebooks to schedule the training job on a recurring basis.
Why wrong: Notebooks are for interactive development, not scheduling production pipelines.
- D
Use Cloud Build and Cloud Functions to trigger the pipeline when new training data arrives in Cloud Storage.
Event-driven triggers automate pipeline execution on data arrival.
- E
Use Cloud Composer to orchestrate the pipeline steps, including data extraction, preprocessing, training, and deployment.
Cloud Composer (Airflow) is designed for orchestrating complex workflows with dependencies.
A machine learning team uses Vertex AI Pipelines to orchestrate their training pipeline. They want to trigger the pipeline automatically in response to new data arriving in a Cloud Storage bucket, and also support a scheduled run every day at 6 AM. Which combination of services should they use to achieve both event-driven and schedule-based triggers?
Trap 1: Cloud Scheduler for the schedule, and Cloud Pub/Sub with Push…
Cloud Pub/Sub cannot directly trigger Vertex AI pipelines; it requires a subscriber like Cloud Functions.
Trap 2: Cloud Functions for both schedule and event-driven, using cron…
Cloud Functions supports cron via Cloud Scheduler, but using Cloud Functions for both is unnecessary; Cloud Scheduler is simpler for schedule.
Trap 3: Vertex AI Pipelines built-in scheduler for schedule, and Cloud…
Vertex AI Pipelines does not have a built-in scheduler; it relies on Cloud Scheduler. Also Pub/Sub alone cannot trigger pipelines.
- A
Cloud Scheduler for the schedule, and Cloud Pub/Sub with Push subscription to Vertex AI for event-driven.
Why wrong: Cloud Pub/Sub cannot directly trigger Vertex AI pipelines; it requires a subscriber like Cloud Functions.
- B
Cloud Functions for both schedule and event-driven, using cron trigger.
Why wrong: Cloud Functions supports cron via Cloud Scheduler, but using Cloud Functions for both is unnecessary; Cloud Scheduler is simpler for schedule.
- C
Cloud Scheduler for the schedule, and Cloud Functions triggered by Cloud Storage events to call the Vertex AI API for event-driven.
Correct: Cloud Scheduler for cron schedule, Cloud Functions for event-driven from Cloud Storage.
- D
Vertex AI Pipelines built-in scheduler for schedule, and Cloud Pub/Sub for event-driven.
Why wrong: Vertex AI Pipelines does not have a built-in scheduler; it relies on Cloud Scheduler. Also Pub/Sub alone cannot trigger pipelines.
You are designing a Vertex AI pipeline that includes a container component. The component needs to use a custom container image that is stored in Artifact Registry. How should you specify the container image in the component definition?
Trap 1: Use ContainerOp class from kfp.v2.dsl.
ContainerOp is from KFP SDK v1, not v2.
Trap 2: Use a placeholder in the pipeline YAML.
Not a standard approach; the image must be specified in the component definition.
Trap 3: Use the @dsl.component decorator and set the base_image parameter.
@dsl.component is for Python function-based components, not container components.
- A
Use ContainerOp class from kfp.v2.dsl.
Why wrong: ContainerOp is from KFP SDK v1, not v2.
- B
Use the @dsl.container_component decorator and set the image parameter to the URI.
This is the correct way to define a container component and specify the image URI.
- C
Use a placeholder in the pipeline YAML.
Why wrong: Not a standard approach; the image must be specified in the component definition.
- D
Use the @dsl.component decorator and set the base_image parameter.
Why wrong: @dsl.component is for Python function-based components, not container components.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.