Free PDE Operationalizing machine learning models Practice Questions (2026)

Q: How many Operationalizing machine learning models questions are on the PDE exam?

The Operationalizing machine learning models domain is one of the weighted domains on the PDE exam. The Courseiva question bank has 191 practice questions for this domain.

Q: How can I practice Operationalizing machine learning models questions for PDE?

Click any of the 191 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Operationalizing machine learning models domain.

Practice Operationalizing machine learning models questions

10Q 20Q 30Q 50Q

All PDE Operationalizing machine learning models questions (191)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A company deploys a machine learning model to Vertex AI for real-time predictions. After deployment, they notice that prediction latency spikes during peak traffic hours. Which approach should they take to reduce latency without sacrificing accuracy?

A data science team uses Vertex AI Pipelines to automate retraining. They want to ensure that only models with performance above a threshold are deployed. Which component should they add to the pipeline?

A company trains a custom model using TensorFlow and wants to deploy it to Vertex AI for low-latency predictions. The model is large (2 GB). Which deployment option should they choose?

A company uses Vertex AI to serve a model. They notice that some predictions are incorrect due to data drift. What is the best way to detect and retrain the model automatically?

A financial services company needs to explain predictions from a complex ensemble model for regulatory compliance. Which Vertex AI service should they use?

A team wants to retrain a model weekly using new data stored in BigQuery. They want to minimize manual effort. Which approach should they use?

A company deploys a model to Vertex AI Endpoint. They want to run a canary deployment to test a new model version with 10% of traffic. How should they configure this?

A data scientist uses Vertex AI Workbench notebooks for model development. They want to share the environment with team members while maintaining version control. Which approach should they use?

A company wants to monitor the performance of a deployed model in production. Which metric indicates that the model's predictions are degrading?

A team uses Vertex AI AutoML Tables to train a model. They need to deploy the model for real-time predictions with high availability. Which deployment configuration should they use?

A company uses Vertex AI to serve a model that requires GPU for inference. They want to minimize cost while handling variable traffic. Which strategy should they use?

Which TWO steps are required to deploy a custom scikit-learn model to Vertex AI for online predictions?

Which THREE factors should be considered when designing a Vertex AI Pipeline for continuous training?

Which TWO actions can help reduce prediction latency for a Vertex AI endpoint?

Which THREE metrics should be monitored for a deployed machine learning model in production?

A company has a production machine learning model deployed on Vertex AI Endpoint that predicts customer churn. The model is retrained weekly using a Vertex AI Pipeline that pulls new data from BigQuery. Recently, the model's accuracy has been declining. The data science team suspects data drift but is unsure. They have enabled Vertex AI Model Monitoring but have not set up any alerts. The team wants to diagnose and address the issue quickly. The pipeline runs successfully, and no errors are reported. The model endpoint is serving predictions with average latency of 200ms. What should the team do first?

A retail company uses a Vertex AI endpoint to serve product recommendations. The model is a TensorFlow model deployed with a custom container. Recently, users have reported that recommendations are stale. The model is retrained daily using Vertex AI Pipelines. The pipeline completes successfully, but the endpoint continues to serve the old model. The team checks the pipeline logs and sees that the new model is uploaded to the Vertex AI Model Registry. The endpoint has traffic split set to 100% for the old model. The team needs to update the endpoint to serve the new model version. What should they do?

A company has deployed a machine learning model on Vertex AI Prediction that serves real-time predictions for a customer-facing application. The model was trained using a custom container and is hosted on a single endpoint with a minimum number of nodes. Recently, the team noticed that during peak traffic, prediction latency increases significantly and some requests time out. The endpoint is configured with a baseline traffic split of 100% on the current model version. Which action should the team take to reduce latency and improve reliability?

A data science team is operationalizing a batch prediction job using Vertex AI Batch Prediction. The model uses a custom container that requires a specific GPU for inference. The job processes a large dataset stored in Cloud Storage. The team wants to minimize cost while ensuring the job completes within a 2-hour window. Which configuration should they choose?

A company is deploying a machine learning model for fraud detection. The model is trained using TensorFlow and will be served on Vertex AI Prediction. The team wants to implement model monitoring to detect prediction drift. Which TWO actions should they take? (Choose 2)

A data scientist deploys a new version of a fraud detection model (model2) alongside the existing model (model1) on the same Vertex AI endpoint with a 70/30 traffic split. After 24 hours, the team notices that model2's predictions are significantly different from model1's, and the fraud detection rate has increased. What is the most likely explanation for the change in predictions?

You are a machine learning engineer at a FinTech company. Your team has developed a credit risk model using XGBoost and deployed it on Vertex AI Prediction using a custom container. The model is used for real-time credit decisions, and the endpoint is configured with a single machine type (n1-standard-4) and min_replica_count = 2, max_replica_count = 10. Recently, the team observed that during a promotional campaign, the endpoint's prediction latency increased from 200ms to over 2 seconds, and some requests resulted in 503 errors. You check the Cloud Monitoring metrics and see that CPU utilization reached 100% on the existing replicas, but the number of replicas never scaled beyond the initial 2. The deployment uses a custom container that runs a TensorFlow Serving-like model server. The container image is stored in Artifact Registry. The Vertex AI endpoint is configured with a traffic split of 100% to this model version. What is the most likely cause of the scaling failure, and what step should you take to resolve it?

You have deployed a TensorFlow model on Vertex AI Endpoints with autoscaling. The model receives high traffic during peak hours, but you notice that inference latency increases significantly during cold starts. Which strategy would best minimize cold-start latency without incurring unnecessary cost?

Your team is using Vertex AI Pipelines to orchestrate a model retraining workflow. The pipeline includes a data validation step, a training step, and a model evaluation step. You want to ensure that if the evaluation step fails due to low model performance, the pipeline stops and does not deploy the model. Which approach should you use?

You are using AI Platform Prediction (now Vertex AI) for online predictions. You notice that some requests are failing with a 503 status code. Which is the most likely cause?

A retail company uses a machine learning model to predict inventory demand. The model is retrained weekly using Vertex AI Pipelines. Recently, the model's accuracy has degraded because the data distribution has shifted. Which action should you take to monitor and detect this drift automatically?

You are responsible for deploying a PyTorch model for real-time inference. The model requires GPU acceleration. You want to minimize infrastructure management overhead. Which serving option should you choose?

A data science team has built a model using scikit-learn. They want to operationalize it on Google Cloud without rewriting the code. Which approach should they take?

You have a batch prediction job on Vertex AI that processes millions of records. The job is failing with an out-of-memory error. What is the best way to resolve this?

Your MLOps pipeline uses Vertex AI Pipelines. You want to ensure that model training uses a consistent environment with specific Python package versions. Which approach best achieves this?

Which TWO are best practices for monitoring a deployed machine learning model in production on Vertex AI?

Which THREE considerations are important when designing a batch prediction pipeline for a large dataset on Vertex AI?

Which TWO actions can help reduce the latency of a Vertex AI endpoint serving a large neural network model?

You configured a model deployment monitor on your Vertex AI endpoint as shown. What will happen when the feature 'age' has a skew of 0.4?

In the Vertex AI Pipeline component YAML exhibit, the component is designed to evaluate a model and produce metrics. If the threshold_accuracy is set to 0.85, what is the expected behavior of this component?

You are a data engineer at a financial services company. You have deployed a credit risk model on Vertex AI Endpoints using a custom container with a TensorFlow SavedModel. The model expects input features as a JSON object. Recently, the model has been returning high prediction latency and occasional 503 errors. You have enabled autoscaling with minNodes=2 and maxNodes=10. The model is CPU-only and uses n1-standard-4 machines. Monitoring shows that during peak hours, CPU utilization reaches 90% and memory is at 80%. The number of prediction requests per second peaks at 100. You suspect that the model is not scaling fast enough. Which action will most effectively reduce latency and eliminate 503 errors?

Your company uses Vertex AI Pipelines to automate model retraining. The pipeline has three steps: data extraction from BigQuery, feature engineering using Dataflow, and model training using a custom container on Vertex AI Training. Recently, the pipeline has been failing intermittently at the Dataflow step with a 'The job encountered a transient error. Please retry.' message. You have enabled pipeline retries with 3 attempts. However, the pipeline still fails after 3 retries. You check the logs and find that the Dataflow job requires more resources than the default worker configuration provides. Which change should you make to reduce the failure rate?

A financial services company deploys a regression model to predict loan default risk. The model is served using Vertex AI Endpoints with autoscaling. After deployment, latency increases significantly during peak hours, causing timeouts. The model uses scikit-learn and has a large feature set. Which action should the team take to reduce latency while maintaining prediction accuracy?

A data science team deploys a TensorFlow image classification model to Vertex AI Prediction. The model performs well in offline evaluation but shows a 15% drop in accuracy in production. The production data distribution has shifted compared to the training data. The team needs to continuously monitor and retrain the model. Which solution is most appropriate for detecting drift and triggering retraining?

A data engineering team is operationalizing a machine learning model for real-time fraud detection. The model must process transactions with sub-100ms latency and be highly available. Which TWO strategies should the team implement?

What is the most likely cause of this error?

A healthcare startup is deploying a natural language processing (NLP) model for extracting medical entities from clinical notes. The model is a fine-tuned BERT model served on Vertex AI Prediction using a custom container. The team observes that prediction latency is around 500ms per request, but they need to handle up to 100 requests per second (QPS) with end-to-end latency under 200ms. The model currently runs on n1-standard-4 machines (4 vCPU, 15 GB memory). During load testing, CPU utilization reaches 90% and memory usage is 12 GB. The team is considering options to meet the requirements. Which action should they take?

Drag and drop the steps to deploy a Cloud Dataflow pipeline from a template into the correct order.

Drag and drop the steps to create a Cloud Bigtable instance and table using the CLI into the correct order.

Match each Google Cloud IAM role to its description.

Match each BigQuery feature to its description.

A data scientist has trained an XGBoost model on Vertex AI and wants to deploy it to an endpoint with automatic scaling based on traffic. What is the recommended deployment approach?

A retail company is using a machine learning model for inventory forecasting. They observe that the model's predictions become less accurate over time, especially during holiday seasons. Which monitoring metric should they prioritize?

A financial institution needs to deploy a TensorFlow model for fraud detection with strict latency requirements (<100ms). The model uses custom ops that are not available in standard TF Serving. What is the most appropriate serving solution?

A team is using Kubeflow Pipelines on Google Kubernetes Engine to orchestrate ML workflows. They need to track parameters, metrics, and artifacts for each run. Which tool should they integrate?

A company has a trained model stored in Vertex AI Model Registry. They want to automate retraining when new training data arrives in Cloud Storage. Which approach is most efficient?

An e-commerce company deploys a recommendation model on Vertex AI Endpoints. The endpoint receives a high volume of requests with a large payload. They notice high latency and occasional timeouts. Which action should they take to improve performance without sacrificing accuracy?

A startup is deploying a PyTorch model on Google Cloud. They need to serve predictions for a mobile app with bursty traffic. Which service is most cost-effective?

A data scientist is using Vertex AI to train a model and wants to ensure that the training code and environment are reproducible. Which approach should they take?

A healthcare organization is deploying a model that processes protected health information (PHI). They need to ensure that the inference data is encrypted in transit and at rest, and access is audited. Which combination of services meets these requirements?

A team is debugging a sudden increase in prediction latency for a model deployed on Vertex AI Endpoints. Which TWO metrics in Cloud Monitoring should they examine first? (Choose two.)

A company is migrating ML workflows to Vertex AI Pipelines. They want to ensure best practices for pipeline reproducibility and debugging. Which THREE actions should they take? (Choose three.)

A data engineering team is operationalizing a machine learning model for real-time inference. They need to monitor the model's performance in production. Which THREE types of monitoring should they implement? (Choose three.)

Refer to the exhibit. An ML engineer sees this error when invoking a Vertex AI endpoint. What is the most likely cause?

Refer to the exhibit. A data scientist notices that the evaluation component rarely passes the threshold, causing the pipeline to fail often. What should they do to improve efficiency?

Refer to the exhibit. A Cloud Build step fails when pushing a Docker image to Artifact Registry. What is the missing IAM role for the Cloud Build service account?

You have deployed a classification model on Vertex AI Endpoints. The model's training data had a balanced class distribution, but over time, the production data has shifted such that one class appears 90% of the time. The model's overall accuracy remains high, but the recall for the minority class has dropped significantly. What is the best approach to detect and address this issue?

A data science team has trained a TensorFlow model for image classification and wants to deploy it to production with minimal latency. They have already exported the model as a SavedModel directory. Which service should they use to create an online prediction endpoint?

Your Vertex AI model deployed on an endpoint is experiencing high tail latency during online predictions. The model uses a large embedding layer, and the input size varies. You have enabled automatic scaling with a minimum of 2 replicas and maximum of 10. What is the most likely cause of the latency spikes and the best first step to diagnose?

You run batch predictions using Vertex AI Batch Prediction on a tabular dataset. The job processes 1 million rows and takes 6 hours to complete. You need to reduce the processing time to under 2 hours without increasing cost significantly. What should you do?

Your team uses Vertex AI Feature Store to serve features for online predictions. A feature value changes frequently (e.g., user session clicks). Which type of feature should you use to ensure low-latency writes and reads?

You have two versions of a classification model (v1 and v2) deployed on a Vertex AI Endpoint. You want to gradually roll out v2 to 10% of traffic, monitor performance, and if metrics are better, increase traffic to 100%. You have set up model monitoring for skew and drift. Which configuration should you use?

You need to automate retraining of a model when new training data becomes available every week. The training pipeline runs on Vertex AI Pipelines and is triggered by Cloud Composer. After retraining, you want to evaluate the new model against a golden dataset. If the model's accuracy improves by at least 1%, it should be automatically deployed to the staging endpoint. What is the best way to implement the decision logic?

Your team wants to continuously monitor a deployed model's performance in production. They need to detect when the model's predictions become unreliable due to changes in the real world (e.g., new customer behavior). Which Vertex AI service should they use?

A model deployed on Vertex AI Endpoint is making predictions with high accuracy but the business team suspects bias against a certain demographic group. You need to analyze the model's predictions for fairness. What is the most effective approach?

Which TWO best practices should be followed when managing multiple model versions on Vertex AI Endpoints for a production system?

Which TWO metrics are most important to monitor for a real-time online prediction system to ensure service reliability and model performance?

Which THREE components are typically part of a Vertex AI Pipeline for automated model retraining and deployment?

You run `gcloud ai models describe` and get the error above. The model was created successfully from a training job that completed without errors. The model ID is correct. What is the most likely cause?

A user named Charlie needs to deploy a model to a Vertex AI Endpoint and also create training jobs. Which role should be assigned to Charlie?

A user gets the above error when trying to get online predictions. The model was created and the endpoint exists. What is the most likely reason?

A company has deployed a machine learning model to AI Platform Prediction. The model uses a custom container with a TensorFlow SavedModel. After deployment, the prediction latency is higher than expected. Which action is most likely to reduce latency without significantly impacting model accuracy?

A data scientist wants to automate retraining of a classification model when new labeled data arrives. The model is deployed on AI Platform Prediction. Which Google Cloud service should be used to orchestrate the retraining pipeline?

A company runs a real-time fraud detection model using Cloud Dataflow for streaming inference. The model is updated every hour with new training data. The team wants to minimize downtime and ensure that both old and new model versions are available during the update. Which deployment strategy should they use?

A company uses BigQuery ML to create a classification model. The model is used for batch prediction on a weekly basis. After six months, the data distribution shifts, and model accuracy drops. Which approach should the company take to maintain model performance?

A team has trained a scikit-learn model and wants to deploy it to AI Platform Prediction for online predictions. What is the required format for the model artifact?

A real-time recommendation system uses a custom container deployed on AI Platform Prediction. The model requires a large in-memory embedding lookup table that is loaded from Cloud Storage at startup. The current startup time is over 5 minutes, causing prediction requests to timeout. Which strategy would most effectively reduce startup time?

A data engineering team is building a CI/CD pipeline for machine learning models using Cloud Build and AI Platform. Which TWO practices are essential for ensuring reproducible and safe model deployments?

A company trains a model using Cloud TPUs. The model is deployed to AI Platform Prediction using a custom container with TensorFlow. Which THREE considerations are most important when serving this model?

A team is deploying a TensorFlow model for online predictions on AI Platform Prediction. They want to monitor for data drift and model performance degradation. Which TWO Google Cloud services should they use?

A company has a batch prediction job that runs daily using AI Platform Batch Prediction. The job uses a TensorFlow model and processes 10 GB of data. Recently, the job started failing with the error 'The replica worker 0 exited with a non-zero exit code: Out of memory'. Which action should the team take to resolve this without rewriting the model?

A data science team needs to ensure that a deployed Vertex AI model can handle varying traffic patterns with minimal latency and cost. What should they do?

A team trained a model on a Vertex AI custom training job and wants to deploy it to an endpoint for online predictions. They have the model artifacts stored in Cloud Storage. What steps are required?

A company uses Vertex AI Pipelines to orchestrate ML workflows. They want to automatically retrain the model when new data arrives, but only if the model's performance drops below a threshold. Which approach is best?

A team deployed a model to Vertex AI Endpoint and notices latency spikes during peak hours. What should they first investigate?

An MLOps team wants to implement continuous deployment of ML models using Cloud Build and Vertex AI. They have a GitHub repository with training code. What should they use?

A company has a model that requires GPU for inference and has strict latency requirements. They deployed on Vertex AI Endpoint with autoscaling but observe cold start latency when scaling up. What is the best solution?

A data scientist wants to test a new model version on a small percentage of traffic before full rollout. Which Vertex AI feature allows this?

After deploying a model, the team notices that predictions are significantly different from training data distribution. What should they do?

A company uses Vertex AI Feature Store for serving features. They have a high-throughput online serving requirement. Which configuration should they use?

A company deploys an ML model using Vertex AI Pipelines. They want to ensure reproducibility and traceability. Which TWO practices should they implement?

A team needs to optimize online prediction cost for a model that has unpredictable traffic spikes. Which TWO strategies are most effective?

A company wants to implement a robust MLOps lifecycle on Google Cloud. Which THREE components are essential?

Refer to the exhibit. What is the most likely cause of the error?

100

Refer to the exhibit. The feature store 'my_fs' responds to offline queries but online serving requests fail. What is the most likely cause?

101

Refer to the exhibit. What is the most likely cause?

102

A startup is deploying a machine learning model for real-time fraud detection. They need low latency and automatic scaling during peak hours. Which Google Cloud service should they use?

103

A team has trained a model using AutoML Tables. They want to deploy it for batch predictions on a schedule. What is the simplest approach?

104

A data engineer needs to monitor model performance over time for drift detection. What tool is specifically designed for this?

105

A machine learning pipeline uses Vertex AI Pipelines. One component fails intermittently due to resource constraints. What is the best way to handle this?

106

A company uses a custom container image for model serving. The image is large (10 GB). During deployment, they get timeouts. What should they do?

107

After deploying a model to Vertex AI Endpoints, the prediction responses include unexpected data. The model returns logits instead of probabilities. What is the most likely cause?

108

A financial services company must ensure that predictions from a deployed model do not become biased against protected groups. They have a monitoring system in place. Which metric should they track?

109

A team uses Vertex AI Feature Store for real-time features. They notice that features are frequently missing during prediction serving. What is the best practice to handle missing features?

110

A data scientist developed a model using custom training on Vertex AI. They want to automate the entire training-to-deployment process. Which service should they use?

111

A data engineer is setting up CI/CD for a machine learning model using Cloud Build and Vertex AI. Which two components are essential? (Select 2)

112

A company wants to implement model monitoring for a deployed classification model. Which three types of monitoring should they set up? (Select 3)

113

A team is deploying a complex model with multiple preprocessing steps. They want to ensure consistent preprocessing during training and serving. Which three approaches can achieve this? (Select 3)

114

Refer to the exhibit. An auditor sees the following output from `gcloud ai models list`. What can they conclude about versioning?

115

Refer to the exhibit. A developer sees this log entry when trying to get a prediction. What is the most likely cause?

116

Refer to the exhibit. A data engineer sees these metrics from Cloud Monitoring for a deployed Vertex AI Endpoint. What is the most effective action to reduce latency?

117

A company deploys a machine learning model on Vertex AI for online predictions. The model experiences intermittent spikes in traffic, causing latency increases. Which strategy should the company use to ensure consistent low latency during traffic spikes?

118

A data engineer deploys a TensorFlow model on Vertex AI using a custom container. After deployment, online prediction requests sometimes fail with a 500 error and the message 'Out of memory'. The model requires significant memory during inference. Which action should the engineer take to resolve this issue?

119

A company is building a continuous training pipeline that retrains a model daily using new data from a feature store. The training data must include features computed up to the timestamp of each training run. Which architecture should be used to ensure time-consistent feature values without label leakage?

120

A company has deployed a classification model on Vertex AI. They want to detect data drift in real-time for the model's input features. Which service should they use?

121

A machine learning team wants to deploy a new model version for canary testing, where only 5% of traffic is routed to the new version. Which Vertex AI endpoint configuration supports this?

122

A company needs to serve predictions for a model that runs an expensive computation on each request. The model is used by a batch job that processes millions of records each night, and also by a real-time API for a few thousand queries per hour. Which prediction strategy minimizes cost and latency for both use cases?

123

A data scientist has iterated on a model and produced a new version. The organization requires the ability to roll back to the previous version quickly if the new version performs poorly in production. Which approach should be used?

124

A Cloud Build pipeline is set up to train a model on Vertex AI. The build fails with the error: 'ERROR: (gcloud.ai-platform.jobs.submit.training) NOT_FOUND: The parent project does not exist.' The project ID and the service account are correctly configured. What is the most likely cause?

125

A company serves multiple models using Vertex AI endpoints. Each model has different latency and memory requirements. To minimize cost, the company wants to share underlying compute resources among models. Which approach should they use?

126

Which TWO are benefits of using Vertex AI Endpoints for model serving?

127

Which THREE steps are required to set up a continuous training pipeline on Google Cloud using Vertex AI?

128

Which TWO are common causes of prediction bias in a deployed machine learning model in production?

129

A company needs to deploy a trained model for real-time predictions with low latency. Which Vertex AI resource should they use?

130

A data engineer wants to automatically detect when the distribution of input features to a production model has shifted significantly. Which Vertex AI feature should they enable?

131

A team has multiple versions of a model and wants to manage them centrally, including tracking metadata and promoting versions to production. Which tool should they use?

132

A production model deployed on Vertex AI Endpoint is experiencing high latency during traffic spikes. The current configuration uses a single replica. What is the most efficient solution?

133

A company wants to automate model retraining and deployment whenever new training data becomes available. Which service should be used to orchestrate the end-to-end workflow?

134

A data scientist needs to provide explanations for each prediction made by a deployed autoML model to comply with regulatory requirements. Which Vertex AI feature should they use?

135

A company runs large batch prediction jobs on Vertex AI every day. They want to minimize costs while ensuring the jobs complete within a 4-hour window. The model requires significant memory. What is the most cost-effective approach?

136

A team is implementing CI/CD for their ML models using Google Cloud. They want to automatically retrain and deploy a new model version when new training data arrives in Cloud Storage. Which combination of services should they use?

137

A team is training a large model using a custom container with TensorFlow on Vertex AI Training. They need to use multiple GPUs across several machines. Which strategy should they implement to maximize training throughput?

138

Which TWO actions should you take to ensure model reliability in a production Vertex AI Endpoint?

139

Which THREE Google Cloud services are typically used together in a production ML pipeline?

140

Which TWO strategies help reduce prediction latency for a real-time model deployed on Vertex AI Endpoint?

141

Refer to the exhibit. What is the cause of this error?

142

Refer to the exhibit. This log entry was generated by Vertex AI Model Monitoring for a production model. What should the data engineer do to address this issue?

143

Refer to the exhibit. A team is trying to run a custom prediction container on Vertex AI Endpoint. They get this error when the container starts. What is the most likely cause?

144

Your company has a machine learning model that predicts customer churn. The model is deployed on Vertex AI Endpoints with autoscaling. After a marketing campaign, traffic to the endpoint increases by 10x. Some predictions start failing with 'HTTP 503 Service Unavailable' errors. What is the most likely cause?

145

You are deploying a machine learning model to production using Vertex AI. The model requires GPU acceleration for low-latency predictions. You need to minimize costs while ensuring availability during a defined business hours window (8 AM to 6 PM). Which deployment strategy should you use?

146

You are responsible for monitoring a production ML model on Vertex AI. The model predicts loan approval probability. The business team reports that the model's predictions are becoming less accurate over the last week. You check the model's monitoring dashboard and see that the prediction distribution has changed significantly. What is the most likely issue?

147

Your team uses a CI/CD pipeline with Cloud Build to train and deploy ML models on Vertex AI. You want to ensure that only models that pass validation checks (e.g., accuracy threshold, fairness metrics) are promoted to production. What is the best way to implement this?

148

You deployed a model on Vertex AI Endpoints using a custom container. The model serves predictions but the latency is higher than expected. You suspect the container is not making full use of the CPU resources. What should you do to reduce latency?

149

Your organization uses Vertex AI Feature Store to serve features for a real-time fraud detection model. The model is deployed on a Vertex AI endpoint. After a data pipeline update, the model's online predictions became inconsistent. What is the most likely cause?

150

You manage a team that deploys multiple versions of a computer vision model for A/B testing on Vertex AI Endpoints. You need to route a small percentage of traffic to a canary version while the rest goes to the stable version. You also need to gradually increase the canary traffic over time based on performance metrics. Which approach should you take?

151

Your company uses Vertex AI Pipelines to automate the ML lifecycle. The pipeline includes training, evaluation, and deployment steps. You want to ensure that if a pipeline run fails due to a transient error (e.g., resource quota shortage), it automatically retries before marking the run as failed. What is the best way to implement this?

152

You are designing a system to serve predictions from a large language model (LLM) with a latency SLO of 500ms. The model does not fit on a single GPU and requires model parallelism. You are considering using Vertex AI Endpoints with a custom container. What additional setup is required to achieve the latency target?

153

Which TWO configurations are required to enable online prediction for a model deployed on Vertex AI Endpoints?

154

Which THREE metrics should be monitored to detect model drift in a production ML system?

155

Which THREE steps are essential for implementing a continuous training pipeline with Vertex AI?

156

Your company runs a real-time recommendation system for a popular e-commerce website using a machine learning model deployed on Vertex AI Endpoints. The model takes user features and product catalog data as input and returns top-10 product recommendations. The system uses a feature store to serve user embeddings and product embeddings. Recently, the recommender team retrained the model with a new algorithm and deployed it as a new version. Since the deployment, the latency for recommendation requests has increased from 100ms to 500ms on average, exceeding the 200ms SLO. The model accuracy is acceptable, and there are no errors. The endpoint uses an n1-standard-8 machine with a single GPU. The new model is larger but still fits on the GPU. You investigate and find that the GPU utilization remains low (<20%), but CPU utilization is high (90%). What should you do to reduce latency while maintaining accuracy?

157

You are a data engineer at a financial services company that uses Vertex AI to train and deploy models for credit risk assessment. The company has strict governance requirements: every model version must be approved by the risk committee before going to production. The approval process can take several days. Currently, the team trains a new model weekly and manually deploys it to a staging endpoint for review, then manually promotes to production after approval. This process is error-prone and slow. You want to automate the pipeline: training should trigger automatically when new data arrives, the model should be automatically deployed to a staging endpoint for review, and after manual approval, it should be promoted to production. Additionally, you need to ensure that if a model in staging performs poorly (e.g., low accuracy), it should not be promoted even if approved. What should you do?

158

A company deploys a scikit-learn model on Vertex AI for online predictions. The model is packaged in a custom container with all dependencies. Users report high latency (over 5 seconds) for predictions. The model size is 2 GB. What is the most likely cause of the high latency?

159

A data scientist trains a TensorFlow model using Vertex AI Training and wants to deploy it for online prediction. Which Vertex AI resource should the data scientist use to create an endpoint for serving predictions?

160

A company has a production model deployed on Vertex AI that shows declining accuracy over time. The model uses features from a BigQuery feature store. The data science team suspects data drift. What is the most efficient way to monitor and detect drift for this model?

161

A team uses Vertex AI Pipelines to automate retraining of a model every month. The pipeline includes data preprocessing, training, and deployment steps. After a recent update, the pipeline fails intermittently with a timeout error during the deployment step. What is the most likely cause?

162

A financial services company deploys a fraud detection model on Vertex AI using a custom prediction container that runs a PyTorch model. The model requires GPU acceleration. The deployment succeeds but predictions return an error: 'CUDA error: out of memory'. What should the team do to resolve this issue?

163

A company uses Vertex AI Feature Store for serving features to both training and prediction. The team notices that predictions made shortly after training use different feature values, causing a training-serving skew. What is the most effective way to prevent this skew?

164

A company wants to version its ML models and track lineage from training data to deployed model. Which Google Cloud service should they use?

165

A data science team wants to deploy a model that requires a custom container with specific NVIDIA CUDA version. They build the image and push to Artifact Registry. When deploying to Vertex AI, the model fails to load with an error: 'Failed to start container: invalid ELF header'. What is the most likely cause?

166

A company is designing a CI/CD pipeline for their ML models using Cloud Build and Vertex AI. Which TWO practices should they adopt to ensure reliable and reproducible deployments?

167

A team monitors a deployed Vertex AI model and notices an increasing number of prediction errors with status code 413 (Request Entity Too Large). Which TWO actions should they consider to resolve this issue?

168

During a Vertex AI training pipeline, the training job fails with an error: 'Out of memory: Killed process'. The model is a large deep learning model using TensorFlow. Which THREE steps should the team take to resolve this issue?

169

Your company deploys a classification model on Vertex AI for online predictions. The model is an XGBoost model trained on tabular data with 500 features. The endpoint uses a single n1-standard-4 node. After deployment, users report that predictions take 8-10 seconds on average, while the required SLA is under 2 seconds. You have already verified that the model is not large (under 100 MB) and the input data size is small. The endpoint does not scale automatically. Which action should you take to reduce latency to meet the SLA? A) Change the machine type to n1-highcpu-4 to prioritize compute over memory. B) Enable autoscaling by setting min replicas to 2 and max replicas to 5. C) Switch to a custom container that preloads the model into memory. D) Reduce the number of features by half.

170

A retail company uses Vertex AI Pipelines to automate monthly retraining of a recommendation model. The pipeline consists of three steps: (1) extract data from BigQuery, (2) train a TensorFlow model on Vertex AI Training, (3) upload the model to Vertex AI Model Registry and deploy to an endpoint if performance metrics improve. Recently, the pipeline has been failing at step 2 with the error: 'The job was cancelled by the system because it exceeded the maximum training time of 3600 seconds.' You have confirmed that the training code is correct and the data size has not changed significantly. What should you do to fix this pipeline failure? A) Reconfigure the pipeline to use a larger machine type for training. B) Set the training timeout to 7200 seconds in the pipeline configuration. C) Reduce the training dataset size by sampling fewer rows. D) Switch from TensorFlow to a simpler model framework.

171

A healthcare company deploys a model for diagnosing medical images on Vertex AI using a custom container with a TensorFlow model. The model uses a mixture of GPUs (NVIDIA T4) and CPUs. After deployment, you notice that prediction latency is highly variable: sometimes under 100ms, sometimes over 10 seconds. Investigation shows that the variability correlates with the number of concurrent requests. The endpoint has a min replicas of 1 and max replicas of 3, with target CPU utilization set to 80%. You also observe that GPU utilization remains low (<20%) even during high load. What is the most likely cause of the latency variability? A) The model is not fully utilizing GPUs due to inefficient data loading from CPU. B) The autoscaling metric (CPU utilization) is not appropriate for a GPU-bound workload; the endpoint does not scale based on GPU utilization. C) The GPU machine type is too small for the model. D) The container is not configured to use the GPU correctly.

172

An e-commerce company uses Vertex AI to serve a real-time personalization model. The model is updated daily via a retraining pipeline that uploads a new version to the same endpoint. Recently, after a model update, the online prediction responses have been returning anomalous results (e.g., recommending irrelevant products). The previous version performed well. The team suspects that the new model is undercooked or has a bug. They have already checked the training code and the pipeline logs, which show no errors. The pipeline deploys the new model version to the endpoint by updating the traffic split to route 100% of traffic to the new version. Which course of action should the team take to quickly mitigate the issue while diagnosing the root cause? A) Roll back the endpoint to the previous model version by setting traffic split to 0% for the new version. B) Delete the current endpoint and recreate it with the previous model version. C) Tweak the training hyperparameters and retrain immediately. D) Increase the number of replicas on the endpoint to handle load.

173

A data science team has deployed a custom TensorFlow model on Vertex AI Prediction. They notice increasing prediction latency and a growing number of 503 errors during peak traffic hours. The model is served using a single regional endpoint with min replica count of 2 and max replica count of 10. Which TWO actions should the team take to address these issues?

174

An MLOps team manages a pipeline that retrains an XGBoost classifier weekly using BigQuery data. The pipeline is orchestrated with Cloud Composer and deploys the new model to Vertex AI Endpoint if validation metrics (AUC > 0.9) are met. Over the past month, the deployed model's AUC has dropped from 0.95 to 0.88, despite the training pipeline consistently reporting AUC > 0.9. Which THREE steps should the team take to diagnose and fix this issue?

175

Your company has deployed a machine learning model on Vertex AI Endpoint to serve real-time predictions for a mobile application. The model was trained using TensorFlow and the prediction requests include raw images that are preprocessed by the client before sending. Recently, the application developers reported that the predictions are becoming less accurate over time. They suspect the issue is related to changes in the client-side preprocessing code. You need to verify this hypothesis and monitor for future regressions. What should you do?

176

Your team is responsible for operationalizing a series of machine learning models that are trained and deployed using Vertex AI Pipelines. The pipeline consists of several steps including data preprocessing, training with hyperparameter tuning, model evaluation, and deployment to an endpoint. Recently, the pipeline has been failing intermittently at the model evaluation step with an error indicating insufficient memory. The evaluation step uses a custom container with a memory limit of 4 GB. The training step uses 8 GB and completes successfully. You need to resolve the failure without drastically increasing costs. What should you do?

177

You manage a large-scale machine learning system that recommends products to users. The model is a deep neural network trained on TensorFlow and deployed on Vertex AI Endpoint with global load balancing. The model receives over 10,000 requests per second. Recently, the team added a new feature: the user's current geographic location (latitude/longitude). After deploying the updated model, you notice that the average prediction latency has doubled, and the error rate has increased, particularly for requests from regions far from the model's primary training data (North America). You suspect the location feature is causing issues. What should you do to diagnose and mitigate the problem?

178

A startup is using Cloud Build to automate the training and deployment of their machine learning models. The workflow is defined in cloudbuild.yaml and includes steps to: 1) Run a training job on AI Platform Training, 2) Build a custom prediction container, 3) Deploy the container to Cloud Run for serving. The deployment step fails intermittently with the error: 'Cloud Run service already exists and is not owned by the calling user.' You need to fix this so that deployments are reliable. What should you do?

179

Your organization deploys multiple versions of the same model to Vertex AI Endpoint for A/B testing. You have a production model (v1) serving 90% of traffic and a candidate model (v2) serving 10%. After one week, you observe that v2 has a slightly lower AUC but significantly higher business metrics like click-through rate. The product team wants to gradually increase v2's traffic. However, you need to ensure that the overall prediction latency remains under 200 ms. Currently, the endpoint has 10 replicas for v1 and 2 replicas for v2. What is the best approach to roll out v2 while maintaining latency SLO?

180

A financial services company uses a custom container on Vertex AI Prediction to serve a fraud detection model. The container runs a Flask app that loads a large feature engineering library (~2 GB) at startup. The model is updated weekly. For the past two weeks, the new model version has been failing health checks and showing 'Container failed to start' errors in the logs. The previous versions worked fine. You inspect the container image and confirm it is built correctly using Cloud Build. The only change in the latest build is an updated version of the feature engineering library. What is the most likely cause and how should you fix it?

181

Your team has implemented a CI/CD pipeline using Cloud Composer (Apache Airflow) to retrain a model every day. The pipeline reads new data from BigQuery, trains a model using Vertex AI Training, evaluates it, and if the accuracy improves, deploys it to a Vertex AI Endpoint. For the past week, the pipeline has been running successfully but no new model has been deployed because the evaluation accuracy never exceeds the previous model's accuracy. The training data volume has been consistent. You suspect that the model is not learning from the new data. What should you do?

182

Your company runs batch predictions using Vertex AI Batch Prediction on a monthly basis. The predictions are used to generate customer segments for marketing campaigns. This month, the batch prediction job failed with an error: 'The number of rows in the input table does not match the number of rows in the output table.' The input table in BigQuery has 5 million rows, but the output table has only 4.5 million rows. You need to identify and handle the missing predictions. What is the most efficient course of action?

183

Your team manages a multi-model ensemble deployed on Vertex AI Endpoint. The ensemble consists of three models: a neural network (NN), a gradient boosted tree (GBT), and a logistic regression (LR). They are deployed as separate endpoints and traffic is split using a traffic split configuration. Recently, the overall accuracy dropped from 92% to 85%. Monitoring shows that the NN model's latency has increased significantly, causing it to miss timeouts and fall back to default predictions. The other two models are performing normally. The NN model is the most complex and handles the majority of the traffic. You need to restore accuracy quickly. What should you do first?

184

A company deploys a new machine learning model for real-time predictions using Vertex AI. The model is stored in a Cloud Storage bucket and deployed to an endpoint. To ensure traceability and rollback capability, which practice should be followed?

185

A team notices that the latency for online predictions from a Vertex AI endpoint has increased significantly over the past hour. The model is a large TensorFlow model deployed with automatic scaling (minReplicaCount=2, maxReplicaCount=10). The CPU utilization of the deployed instances is consistently above 85%. What is the most likely cause of the increased latency?

186

A financial services company uses Vertex AI to serve a fraud detection model. The model was trained on historical data that is updated daily. The team wants to automate retraining when data drift is detected. Which approach best operationalizes this requirement with minimal manual intervention?

187

A retail company needs to generate product recommendations for millions of users every few hours. The model is a small scikit-learn model. Which prediction method should be used to minimize infrastructure cost while meeting the latency requirements?

188

A company deploys a TensorFlow model on Vertex AI for online predictions. They want to monitor model performance in production to detect degradation. Which TWO practices should they implement? (Choose 2.)

189

A data science team uses Cloud Build and Vertex AI to implement CI/CD for their machine learning models. Which THREE steps are essential for a production-ready operationalization pipeline? (Choose 3.)

190

Refer to the exhibit. A data scientist deploys a model using this configuration. Users report that after a few hours of inactivity, the first prediction request takes over 30 seconds. What is the most likely cause?

191

A healthcare company uses Vertex AI to deploy a medical image classification model. The model is deployed on a private endpoint with automatic scaling (minReplicaCount=2, maxReplicaCount=10). The model uses a custom container with a GPU for inference. Recently, during peak business hours (9 AM - 5 PM), users report that prediction requests frequently time out after 60 seconds, and the error rate increases. The team checks Cloud Monitoring and observes that CPU utilization averages 40%, GPU utilization averages 30%, and the number of replicas stays at 2. There are no errors in the container logs. The model serves a few hundred requests per second during peak. The team suspects the issue is not resource saturation but something else. What should they do to resolve the problem?

Practice all 191 Operationalizing machine learning models questions

Other PDE exam domains

Designing data processing systems Building and operationalizing data processing systems Ensuring solution quality

Frequently asked questions

What does the Operationalizing machine learning models domain cover on the PDE exam?

The Operationalizing machine learning models domain covers the key concepts tested in this area of the PDE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PDE domains — no account required.

How many Operationalizing machine learning models questions are in the PDE question bank?

The Courseiva PDE question bank contains 191 questions in the Operationalizing machine learning models domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Operationalizing machine learning models for PDE?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Operationalizing machine learning models questions for PDE?

Yes — the session launcher on this page draws questions exclusively from the Operationalizing machine learning models domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your PDE domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included