Sample questions
Google Professional Machine Learning Engineer practice questions
A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?
Trap 1: Implement a feature engineering pipeline that compresses the input…
While potentially beneficial, this is a longer-term solution and does not provide immediate latency relief during the surge.
Trap 2: Deploy a newer version of the model that uses a more efficient…
Deploying a new model requires time for development, testing, and approval, and may not be feasible for immediate mitigation.
Trap 3: Increase the number of TensorFlow Serving instances by reducing the…
Reducing CPU requests may lead to CPU starvation and pod instability, harming latency further.
- A
Implement a feature engineering pipeline that compresses the input features to reduce data size and inference time.
Why wrong: While potentially beneficial, this is a longer-term solution and does not provide immediate latency relief during the surge.
- B
Deploy a newer version of the model that uses a more efficient architecture to reduce computational complexity.
Why wrong: Deploying a new model requires time for development, testing, and approval, and may not be feasible for immediate mitigation.
- C
Increase the number of TensorFlow Serving instances by reducing the CPU request per pod in GKE to allow more pods per node.
Why wrong: Reducing CPU requests may lead to CPU starvation and pod instability, harming latency further.
- D
Add more nodes to the GKE cluster to increase the total CPU resources available for serving.
Adding nodes increases compute capacity, allowing more parallel inference and reducing latency under high load.
A global retail company uses Vertex AI Recommendations to provide product recommendations on their website. They have a large catalog and millions of users. The initial deployment works well for active users, but they notice that new users (with no purchase history) receive generic recommendations that are not personalized. The company wants to improve the cold-start experience. They have user demographic data (age, location) available at sign-up. Current recommendation model is a collaborative filtering model using the built-in Vertex AI Recommendations. What should the company do to improve personalization for new users?
Trap 1: Collect more historical interaction data before showing…
New users have no history; waiting does not help.
Trap 2: Disable recommendations for new users until they have at least 10…
This would lose the opportunity to engage new users.
Trap 3: Build a custom two-tower recommendation model using Vertex AI…
Building a custom model is more complex and may not be needed.
- A
Collect more historical interaction data before showing recommendations
Why wrong: New users have no history; waiting does not help.
- B
Disable recommendations for new users until they have at least 10 interactions
Why wrong: This would lose the opportunity to engage new users.
- C
Increase the user exploration parameter in the Vertex AI Recommendations configuration
Exploration helps serve diverse items to new users to learn preferences.
- D
Build a custom two-tower recommendation model using Vertex AI Training
Why wrong: Building a custom model is more complex and may not be needed.
Your team is developing a machine learning model for real-time fraud detection. The training pipeline runs on Vertex AI and uses BigQuery for feature engineering. Recently, the pipeline has been taking significantly longer to execute. Upon investigation, you find that the BigQuery query for feature extraction is being rerun every time the pipeline runs, even though the underlying data hasn't changed. The pipeline is scheduled to run every hour. You want to reduce cost and execution time without losing the ability to detect data drifts. Which approach should you take?
Trap 1: Implement a caching mechanism in the pipeline that stores the…
Pipeline caching is based on component inputs, not on data content, so it may not prevent rerun if inputs differ.
Trap 2: Reduce the pipeline frequency to once a day to minimize the number…
This reduces cost but delays model updates and data drift detection.
Trap 3: Use a conditional pipeline that checks if the data has changed…
This adds complexity and still requires executing the pipeline to perform the check.
- A
Implement a caching mechanism in the pipeline that stores the results of the BigQuery query and reuses them if the data hasn't changed.
Why wrong: Pipeline caching is based on component inputs, not on data content, so it may not prevent rerun if inputs differ.
- B
Move the feature extraction to a separate scheduled query in BigQuery and load the results into a table that the pipeline reads from.
This separates concerns and avoids redundant execution, while still allowing data drift detection via the pipeline.
- C
Reduce the pipeline frequency to once a day to minimize the number of runs.
Why wrong: This reduces cost but delays model updates and data drift detection.
- D
Use a conditional pipeline that checks if the data has changed before running the feature extraction step.
Why wrong: This adds complexity and still requires executing the pipeline to perform the check.
A healthcare organization is building a machine learning model to predict patient readmission risk. They have sensitive data stored in BigQuery that includes protected health information (PHI). The data science team uses Vertex AI Workbench notebooks to explore the data and develop models. The organization's security policy requires that all PHI data must be encrypted at rest and in transit, and that access to the data is logged and audited. They also need to ensure that the data used for model training is de-identified to remove direct identifiers such as patient names and SSNs. The team wants to automate the de-identification process as part of the data pipeline. Which approach meets these requirements?
Trap 1: Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC…
Shielded VM and VPC-SC provide security but do not de-identify data.
Trap 2: Use Cloud Key Management Service to encrypt the PHI columns in…
Encryption does not remove identifiers; the team would still see PHI after decryption.
Trap 3: Use BigQuery row-level security to mask PHI columns for the data…
Row-level security does not remove identifiers for training; it only masks at query time.
- A
Create a Dataflow pipeline that reads from the original BigQuery table, applies Cloud DLP de-identification transforms, and writes to a new BigQuery table. Grant the data science team access to the de-identified table.
Dataflow with DLP automates de-identification and creates a safe dataset.
- B
Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC to restrict data access.
Why wrong: Shielded VM and VPC-SC provide security but do not de-identify data.
- C
Use Cloud Key Management Service to encrypt the PHI columns in BigQuery, and share the encryption key with the data science team.
Why wrong: Encryption does not remove identifiers; the team would still see PHI after decryption.
- D
Use BigQuery row-level security to mask PHI columns for the data science team, and train the model directly on the original table.
Why wrong: Row-level security does not remove identifiers for training; it only masks at query time.
You are an ML engineer at a global e-commerce company. Your team has developed a deep learning model for product recommendation that runs on Vertex AI Prediction. The model is deployed on a single n1-highmem-2 instance (CPU only) with autoscaling enabled (min replicas=1, max replicas=10). During Black Friday, traffic spikes to 1000 requests per second (QPS), and you observe that latency increases from 50ms to over 5000ms, and many requests time out. You check the monitoring dashboard and see that CPU utilization is at 100% on the single instance, and autoscaling is not triggering quickly enough. The team has a budget for this service and wants to handle the spike without compromising latency. What should you do?
Trap 1: Increase min replicas to 5 to keep warm instances
Without improving per-instance throughput, warm instances may still be insufficient.
Trap 2: Set min replicas=1 and max replicas=5 to control cost
Limiting max replicas may not handle the spike.
Trap 3: Increase max replicas to 20 and keep CPU instances
CPU instances have high latency per request; more replicas may not reduce latency enough.
- A
Switch to GPU instances (e.g., n1-standard-4 with T4) and set min replicas=2 with autoscaling up to 10
GPUs accelerate inference, reducing per-request latency; warm instances handle spike.
- B
Increase min replicas to 5 to keep warm instances
Why wrong: Without improving per-instance throughput, warm instances may still be insufficient.
- C
Set min replicas=1 and max replicas=5 to control cost
Why wrong: Limiting max replicas may not handle the spike.
- D
Increase max replicas to 20 and keep CPU instances
Why wrong: CPU instances have high latency per request; more replicas may not reduce latency enough.
A financial services company uses Vertex AI AutoML Tables to build a credit risk model. The dataset contains 500,000 rows and 50 features, including loan amount, credit score, debt-to-income ratio, and employment length. The target variable is binary: 'default' (1) or 'no default' (0). The data is highly imbalanced, with only 2% defaults. The data scientist trains a model with AutoML Tables using default settings. The evaluation metrics show an AUC of 0.85, but the confusion matrix reveals that the model predicts 'no default' for almost all cases, missing most defaults. The data scientist needs to improve the model's ability to identify defaults without significantly increasing false positives. They have limited time and cannot write custom code. What should they do?
Trap 1: Manually split the data into a stratified train/test set to ensure…
Why B is wrong: AutoML already does stratified splitting; this won't improve recall.
Trap 2: Train multiple models with different algorithms (e.g., XGBoost,…
Why C is wrong: This requires custom code, not low-code.
Trap 3: Under-sample the majority class to create a balanced dataset and…
Why D is wrong: Under-sampling may lose valuable data and is not recommended with AutoML.
- A
Manually split the data into a stratified train/test set to ensure the same proportion of defaults in each.
Why wrong: Why B is wrong: AutoML already does stratified splitting; this won't improve recall.
- B
Train multiple models with different algorithms (e.g., XGBoost, Random Forest) and blend them using a custom script.
Why wrong: Why C is wrong: This requires custom code, not low-code.
- C
Enable 'Enable weighted evaluation' and set the optimization objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5.
Why A is correct: AutoML Tables supports custom optimization objectives to handle imbalance.
- D
Under-sample the majority class to create a balanced dataset and retrain.
Why wrong: Why D is wrong: Under-sampling may lose valuable data and is not recommended with AutoML.
A financial services firm deploys a binary classification model for fraud detection. The model's precision is 0.95 and recall is 0.60 on the test set. After deployment, the fraud rate in production is 0.5% compared to 5% in the test set. The model shows good calibration on the test set (Brier score 0.02) but poor calibration in production (Brier score 0.15). What is the most likely explanation for the calibration degradation?
Trap 1: The distribution of input features has shifted significantly,…
Feature drift can cause poor performance, but the problem statement does not mention feature drift; calibration degradation is specifically addressed.
Trap 2: The model overfits to noise in the training data, leading to poor…
Overfitting would show poor test set performance, but the test set had good Brier score.
Trap 3: The production data has a different class imbalance than the…
Class imbalance alone does not explain miscalibration; the model's probability estimates can still be calibrated if the imbalance is accounted for.
- A
The distribution of input features has shifted significantly, causing the model to produce incorrect probabilities.
Why wrong: Feature drift can cause poor performance, but the problem statement does not mention feature drift; calibration degradation is specifically addressed.
- B
The model overfits to noise in the training data, leading to poor generalization.
Why wrong: Overfitting would show poor test set performance, but the test set had good Brier score.
- C
The production data has a different class imbalance than the training data, causing the model to be biased toward the majority class.
Why wrong: Class imbalance alone does not explain miscalibration; the model's probability estimates can still be calibrated if the imbalance is accounted for.
- D
The relationship between features and the target has changed (concept drift), causing the model's probability estimates to be misaligned with the true probabilities.
Concept drift changes the conditional distribution P(Y|X), which directly affects calibration.
A logistics company uses Vertex AI AutoML Tables to predict delivery delays based on order attributes, weather data, and traffic data. The model is retrained weekly using a Vertex AI Pipeline that runs a BigQuery query to get training data, then triggers AutoML training. Recently, the pipeline fails with the error 'Dataset not found' when the AutoML training step starts. The BigQuery query runs successfully and outputs a table. Which is the most likely cause?
Trap 1: The AutoML training step is referencing a different dataset…
Possible but less likely; the error points to missing import step.
Trap 2: The training data has been manually deleted from Cloud Storage.
The error is 'Dataset not found', not data missing.
Trap 3: The pipeline's IAM permissions are insufficient to access BigQuery.
The BigQuery query succeeded, so permissions are fine.
- A
The AutoML training step is referencing a different dataset location.
Why wrong: Possible but less likely; the error points to missing import step.
- B
The training data has been manually deleted from Cloud Storage.
Why wrong: The error is 'Dataset not found', not data missing.
- C
The pipeline's IAM permissions are insufficient to access BigQuery.
Why wrong: The BigQuery query succeeded, so permissions are fine.
- D
The BigQuery output table is not being passed as a Vertex AI Dataset resource.
The pipeline must create a Vertex AI Dataset from the BigQuery table for AutoML to use.
Match each ML model interpretability method to its description.
Drag a concept onto its matching description — or click a concept then click the description.
Game-theoretic approach to explain feature contributions
Local surrogate model to explain individual predictions
Ranking features by their impact on model output
Shows marginal effect of a feature on predictions
Measures decrease in performance when feature is shuffled
Match each feature engineering technique to its description.
Drag a concept onto its matching description — or click a concept then click the description.
Convert categorical variable into binary columns
Combine two or more features to capture interactions
Normalize numeric features to a standard range
Group continuous values into discrete intervals
Weight term frequency by inverse document frequency
A financial services company has deployed a classification model on Vertex AI to detect fraudulent transactions. The model is monitored using Vertex AI Model Monitoring for skew and drift detection, and also logs predictions to BigQuery for analysis. After a month, the monitoring alerts show a significant drift in one feature (transaction_amount). Which TWO actions should the team take to diagnose and address this issue?
Trap 1: Increase the frequency of model monitoring checks to every hour.
More frequent monitoring does not address the cause of drift.
Trap 2: Increase the sampling rate for prediction logging to ensure full…
While helpful for analysis, it's not a direct corrective action for drift.
Trap 3: Reduce the alert threshold to minimize false positives.
This would suppress legitimate alerts and not solve the drift issue.
- A
Compare the feature distribution in the training data with the recent serving data using statistical tests.
This diagnostic step helps understand the nature and extent of the drift.
- B
Retrain the model on the most recent data to incorporate the new distribution.
If drift is due to a real shift, retraining with recent data can improve performance.
- C
Increase the frequency of model monitoring checks to every hour.
Why wrong: More frequent monitoring does not address the cause of drift.
- D
Increase the sampling rate for prediction logging to ensure full data capture.
Why wrong: While helpful for analysis, it's not a direct corrective action for drift.
- E
Reduce the alert threshold to minimize false positives.
Why wrong: This would suppress legitimate alerts and not solve the drift issue.
Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag and drop the steps to set up a BigQuery ML linear regression model for forecasting in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag and drop the steps to deploy a trained TensorFlow model to Vertex AI Prediction in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag and drop the steps to set up a distributed training job on Vertex AI using a custom container in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Drag and drop the steps to set up a batch prediction job using Vertex AI in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
You are designing an ML pipeline for a large-scale recommendation system that runs weekly retraining on historical user interaction data. The pipeline uses TensorFlow and is deployed on Google Cloud. The pipeline must be orchestrated and automated with minimal manual intervention. Which THREE options should you include in your design? (Choose three.)
Trap 1: Use BigQuery scheduled queries to run the training script on a…
BigQuery scheduled queries are for SQL queries, not running ML training jobs.
Trap 2: Use AI Platform Notebooks to schedule the training job on a…
Notebooks are for interactive development, not scheduling production pipelines.
- A
Use BigQuery scheduled queries to run the training script on a schedule.
Why wrong: BigQuery scheduled queries are for SQL queries, not running ML training jobs.
- B
Use Vertex AI Pipelines to define the ML pipeline as a Directed Acyclic Graph (DAG) of components.
Vertex AI Pipelines is purpose-built for ML pipelines.
- C
Use AI Platform Notebooks to schedule the training job on a recurring basis.
Why wrong: Notebooks are for interactive development, not scheduling production pipelines.
- D
Use Cloud Build and Cloud Functions to trigger the pipeline when new training data arrives in Cloud Storage.
Event-driven triggers automate pipeline execution on data arrival.
- E
Use Cloud Composer to orchestrate the pipeline steps, including data extraction, preprocessing, training, and deployment.
Cloud Composer (Airflow) is designed for orchestrating complex workflows with dependencies.
A manufacturing company wants to predict equipment failure using sensor data stored in BigQuery. They have limited ML expertise and want to use AutoML Tables. The data includes timestamps, numerical sensor readings, and a boolean 'failure' column. The dataset is highly imbalanced with only 1% failure cases. Which of the following is the most effective approach to handle the imbalance in AutoML Tables?
Trap 1: Downsample the majority class to balance the dataset.
Loses data and may reduce model accuracy.
Trap 2: Use a custom loss function in the training configuration.
AutoML Tables does not support custom loss functions.
Trap 3: Oversample the minority class using SQL before training.
AutoML Tables expects raw data; manual resampling may interfere with its optimizations.
- A
Let AutoML Tables handle the imbalance automatically; it has built-in techniques for class imbalance.
AutoML Tables automatically adjusts for imbalance.
- B
Downsample the majority class to balance the dataset.
Why wrong: Loses data and may reduce model accuracy.
- C
Use a custom loss function in the training configuration.
Why wrong: AutoML Tables does not support custom loss functions.
- D
Oversample the minority class using SQL before training.
Why wrong: AutoML Tables expects raw data; manual resampling may interfere with its optimizations.
A team uses Vertex AI Feature Store to serve features for real-time predictions. They notice that feature values are frequently updated from multiple source systems, leading to inconsistencies. They need to ensure that feature values are consistent across all serving endpoints. What should they do?
Trap 1: Use batch ingestion with weekly updates to reduce update frequency
Batch updates with long intervals increase the chance of serving stale features.
Trap 2: Increase the offline storage TTL to retain historical feature values
Retention does not affect consistency.
Trap 3: Implement a manual approval process for feature updates
Manual process is not scalable for frequent updates.
- A
Use batch ingestion with weekly updates to reduce update frequency
Why wrong: Batch updates with long intervals increase the chance of serving stale features.
- B
Increase the offline storage TTL to retain historical feature values
Why wrong: Retention does not affect consistency.
- C
Implement a manual approval process for feature updates
Why wrong: Manual process is not scalable for frequent updates.
- D
Use a streaming ingestion pipeline with exactly-once semantics
Exactly-once streaming ensures each update is applied exactly once, maintaining consistency.
An organization uses Cloud Composer to orchestrate ML workflows. A DAG that triggers Vertex AI training jobs fails because the training job exceeds the 7-day maximum runtime. What is the best way to handle long-running training jobs in Cloud Composer?
Trap 1: Increase the DAG execution timeout to 14 days in the Airflow…
Cloud Composer has a 7-day limit for DAG runs, and increasing timeout may not be allowed.
Trap 2: Refactor the training job to run on Dataflow, which supports longer…
Dataflow is for data processing, not model training.
Trap 3: Set max_active_runs=1 in the DAG to prevent overlapping runs
This does not address the runtime limit.
- A
Increase the DAG execution timeout to 14 days in the Airflow configuration
Why wrong: Cloud Composer has a 7-day limit for DAG runs, and increasing timeout may not be allowed.
- B
Use Vertex AI Pipeline to manage the training job asynchronously
Vertex AI Pipeline can handle long-running jobs independently of the DAG runtime.
- C
Refactor the training job to run on Dataflow, which supports longer runtimes
Why wrong: Dataflow is for data processing, not model training.
- D
Set max_active_runs=1 in the DAG to prevent overlapping runs
Why wrong: This does not address the runtime limit.
A company deploys a TensorFlow model on Vertex AI Prediction with a single node. During peak hours, inference latency increases. What should they do first to reduce latency?
Trap 1: Increase the machine type of the node
Increasing machine type may help but does not address scaling under load.
Trap 2: Decrease the min replicas to 0
Reducing min replicas may cause cold starts and increase latency.
Trap 3: Enable automatic batching of requests
Batching increases latency as requests wait to be batched.
- A
Enable autoscaling for the deployment
Autoscaling adds nodes during peak traffic, reducing latency.
- B
Increase the machine type of the node
Why wrong: Increasing machine type may help but does not address scaling under load.
- C
Decrease the min replicas to 0
Why wrong: Reducing min replicas may cause cold starts and increase latency.
- D
Enable automatic batching of requests
Why wrong: Batching increases latency as requests wait to be batched.
A company uses Vertex AI Prediction with a custom container for a TensorFlow model. They notice that after deploying a new model version, requests still go to the old version. What is the most likely cause?
Trap 1: The custom container is not compatible with Vertex AI
Vertex AI supports custom containers.
Trap 2: The model is cached and needs cache invalidation
Caching may affect but not the primary cause.
Trap 3: The new model version was not deployed to the same endpoint
If deployed to same endpoint, traffic splitting controls routing.
- A
The custom container is not compatible with Vertex AI
Why wrong: Vertex AI supports custom containers.
- B
The model is cached and needs cache invalidation
Why wrong: Caching may affect but not the primary cause.
- C
Traffic is not split to the new model version
Traffic splitting must be adjusted to route to the new version.
- D
The new model version was not deployed to the same endpoint
Why wrong: If deployed to same endpoint, traffic splitting controls routing.
A company needs to serve a model with strict latency requirements (<100ms). They are using Vertex AI Prediction with CPU. During testing, latency is 150ms. What should they do?
Trap 1: Enable batching to improve throughput
Batching increases per-request latency.
Trap 2: Use a smaller machine type with more replicas
Smaller machines may be slower.
Trap 3: Export the model to TensorFlow Lite
TensorFlow Lite is for mobile/edge, not Vertex AI.
- A
Enable batching to improve throughput
Why wrong: Batching increases per-request latency.
- B
Use a smaller machine type with more replicas
Why wrong: Smaller machines may be slower.
- C
Export the model to TensorFlow Lite
Why wrong: TensorFlow Lite is for mobile/edge, not Vertex AI.
- D
Switch to a GPU machine type
GPUs can reduce inference latency.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.