PMLE Scaling prototypes into ML models — All Questions With Answers

Question 1mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A startup has developed a prototype ML model using scikit-learn on a single machine. They now need to scale it to handle larger datasets and deploy it for real-time predictions. The team is small and wants minimal operational overhead. Which Google Cloud service should they use?

Question 2hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

Question 3easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A company has a prototype ML model that works well on historical data, but when deployed to production, the model performance degrades over time. The data distribution shifts gradually. Which strategy should they implement to maintain model accuracy?

Question 4mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML engineer is scaling a prototype to production using Vertex AI Pipelines. The pipeline includes data validation, preprocessing, training, and deployment steps. They want to ensure that the pipeline can be reproduced and audited. What is the best practice?

Question 5mediummulti select

Read the full Scaling prototypes into ML models explanation →

A team has trained a sentiment analysis model using PyTorch on Vertex AI Training. They now want to deploy it for online predictions with low latency. Which TWO actions should they take? (Choose 2)

Question 6hardmulti select

Read the full Scaling prototypes into ML models explanation →

A company has a prototype ML model that predicts equipment failure. They want to deploy it to production using Vertex AI. The model must be retrained weekly with new data. They also need to monitor for data drift and model performance. Which THREE components should they include in their MLOps pipeline? (Choose 3)

Question 7hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML engineer is trying to upload a TensorFlow model to Vertex AI using the gcloud command shown. The model was trained using TensorFlow 2.11 and saved with model.save('model/'). The engineer sees the error. What is the most likely cause?

Network Topology

Question 8mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

You are an ML engineer at a fintech company. You have a prototype credit risk model built using XGBoost that achieves high accuracy on historical data. The model is trained on a dataset with 500,000 rows and 50 features. The company wants to deploy this model to production to score loan applications in real-time. The production environment must handle a peak load of 100 requests per second with a latency under 200ms. You have decided to use Vertex AI for deployment. After deploying the model as a Vertex AI endpoint with a single n1-standard-4 machine, you notice that latency exceeds 500ms at peak load and some requests time out. You have verified that the model prediction itself (excluding network overhead) takes about 50ms on average. What should you do to meet the latency and throughput requirements?

Question 9mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A machine learning team has a prototype using a custom TensorFlow model trained on a small dataset stored in Cloud Storage. They want to scale the prototype to production with minimal code changes while ensuring the model can handle increased traffic and new data. The model currently loads data using tf.data.Dataset from CSV files. Which approach best meets these requirements?

Question 10easymulti select

Read the full Scaling prototypes into ML models explanation →

Which TWO actions are best practices when scaling a prototype ML model to production in Google Cloud?

Question 11hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A team deployed a prototype classification model to Vertex AI Prediction. After a week, they notice the metrics shown in the exhibit. What is the most likely cause of the performance degradation and latency increase?

Exhibit

Refer to the exhibit.

```
Model accuracy: 0.92
Training data: 10,000 records
Online prediction latency: 95th percentile = 450ms
QPS: 50

After moving to production:
- New data from users: 100,000 records/day
- Data distribution shift detected (new features emerge)
- Prediction latency increases to 95th percentile = 1200ms
- QPS drops to 30
```

Question 12mediumdrag order

Read the full Scaling prototypes into ML models explanation →

Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 13mediumdrag order

Read the full Scaling prototypes into ML models explanation →

Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 14mediummatching

Read the full Scaling prototypes into ML models explanation →

Match each ML acronym to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Area Under the ROC Curve

Mean Squared Error

Tensor Processing Unit

Support Vector Machine

Principal Component Analysis

Question 15mediummatching

Read the full Scaling prototypes into ML models explanation →

Match each ML model interpretability method to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Game-theoretic approach to explain feature contributions

Local surrogate model to explain individual predictions

Ranking features by their impact on model output

Shows marginal effect of a feature on predictions

Measures decrease in performance when feature is shuffled

Question 16easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A team has a trained TensorFlow model running locally and wants to deploy it for low-latency online predictions on Google Cloud. Which service should they use?

Question 17mediummultiple choice

Study the full Python automation breakdown →

An ML team is scaling a prototype to production. The data pipeline currently reads from Cloud Storage and transforms data with a custom Python script. They need to handle higher throughput and add monitoring. Which approach should they take?

Question 18hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A company has a prototype ML model that achieves 85% accuracy on historical data. In production, accuracy drops to 70% after two weeks due to data drift. They need an automated retraining pipeline with minimal manual oversight. Which solution is most cost-effective?

Question 19easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A team prototypes a recommendation model using a Jupyter notebook on Vertex AI Workbench. They want to productionize the model with CI/CD. Which approach should they use to package the model for deployment?

Question 20mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

Question 21hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A company deploys a model to Vertex AI Prediction with autoscaling enabled. During a flash sale, traffic spikes 10x, but the endpoint fails to scale fast enough, causing high latency. What is the most likely cause and solution?

Question 22easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A team just moved a model from prototype to production using Vertex AI. They notice prediction errors for certain inputs that were not present in training data. What should they do to detect such issues automatically?

Question 23mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML engineer needs to run batch predictions on tens of petabytes of data using a trained model. The data is stored in Cloud Storage. Which service should they choose?

Question 24hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A team uses Vertex AI Pipelines to automate training and deployment. They need to ensure that only models that pass a set of quality checks (e.g., accuracy > 0.9, latency < 100ms) are deployed to production. How should they implement this?

Question 25mediummulti select

Read the full Scaling prototypes into ML models explanation →

Which TWO practices are important when scaling a prototype ML model to production on Google Cloud? (Choose two.)

Question 26hardmulti select

Read the full Scaling prototypes into ML models explanation →

Which TWO services are commonly used together to implement an end-to-end ML pipeline that automatically retrains and deploys models on Vertex AI? (Choose two.)

Question 27easymulti select

Read the full Scaling prototypes into ML models explanation →

Which THREE factors should be considered when choosing a compute option for serving a deep learning model in production on Google Cloud? (Choose three.)

Question 28easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A data scientist has trained a scikit-learn model locally and wants to deploy it to Vertex AI for online predictions with low latency. The model is a small RandomForestClassifier (100 MB). What is the recommended way to deploy this model?

Question 29mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A team deploys a PyTorch model on Vertex AI for online predictions. They notice that after deployment, the latency increases over time, especially during peak hours. The model is served using a custom container. What is the most likely cause?

Question 30hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A company has a large-scale ML system that uses Vertex AI Pipelines to retrain models weekly. The pipeline includes a custom training job and a batch prediction step. After moving to production, they observe that batch prediction jobs often fail with 'Quota exceeded' errors. The project has sufficient CPU quota. What is the most likely cause?

Question 31easymultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML engineer needs to monitor a deployed model for data drift. They want to compare the distribution of incoming predictions against a baseline distribution. Which Vertex AI service should they use?

Question 32mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A team uses Vertex AI Feature Store to serve features for online predictions. They notice that the online serving latency is high for certain features. The features are stored in a BigQuery source with high cardinality. What is the best practice to reduce latency?

Question 33hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A large e-commerce company deploys a recommendation model on Vertex AI with autoscaling enabled. During Black Friday, traffic spikes rapidly. The autoscaler adds new instances, but new instances take several minutes to become ready (cold start). As a result, many requests time out. What should they do to mitigate this issue?

Question 34easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A machine learning engineer is exporting a trained model from Vertex AI Training to the Model Registry. Which artifact should they upload as the model artifact?

Question 35mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A company has a TensorFlow model that uses custom operations compiled as .so files. They want to deploy it on Vertex AI for online predictions. The model runs correctly when loaded locally. However, on Vertex AI, the prediction fails with a 'Op type not registered' error. What is the most likely reason?

Question 36hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

An organization runs a batch prediction job on Vertex AI for a large dataset (10 TB). The job is configured to use a cluster of 100 n1-standard-16 machines. Midway through, the job fails with 'Out of memory' errors. What is the most effective mitigation strategy?

Question 37easymulti select

Read the full Scaling prototypes into ML models explanation →

An ML team is deploying a model to Vertex AI for the first time. Which THREE are best practices for scaling from prototype to production?

Question 38mediummulti select

Read the full Scaling prototypes into ML models explanation →

A company has a TensorFlow model that requires GPU for inference. They are deploying on Vertex AI. Which TWO configurations are necessary to ensure GPU is used?

Question 39hardmulti select

Read the full Scaling prototypes into ML models explanation →

A team is troubleshooting a Vertex AI Pipelines run that keeps failing at the model evaluation step. The pipeline includes steps: data preprocessing, training, evaluation, and deployment. Which THREE actions should they take to diagnose the issue?

Question 40easymultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML engineer runs this command to upload a model. The model artifact in Cloud Storage is a directory containing model.pkl and a custom preprocessing script. What will happen when he later deploys this model to an endpoint and sends a prediction request?

Network Topology

Question 41mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A team has deployed a model with autoscaling configured as shown. They notice that during off-peak hours, the endpoint consistently runs 3 instances instead of scaling down to 1. What is the most likely cause?

Exhibit

Refer to the exhibit.
{
  "name": "projects/my-project/locations/us-central1/endpoints/123456",
  "displayName": "my_endpoint",
  "deployedModels": [
    {
      "id": "123",
      "model": "projects/my-project/locations/us-central1/models/456",
      "displayName": "model_v1",
      "automaticResources": {
        "minReplicaCount": 1,
        "maxReplicaCount": 10
      },
      "dedicatedResources": null,
      "enableAccessLogging": true
    }
  ]
}

Question 42hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A team trains a distributed TensorFlow model using the config above. After training, they deploy the model for online predictions. The model returns poor quality predictions. They suspect that the model was not trained correctly due to a configuration error. What is the most likely mistake?

Network Topology

Question 43easymultiple choice

Read the full Scaling prototypes into ML models explanation →

A team has developed a prototype of a recommendation model using a small dataset on a single VM. They need to scale to a larger dataset for production training. They plan to use Vertex AI training with a custom container. What is the best practice for handling the increased data volume?

Question 44easymultiple choice

Read the full Scaling prototypes into ML models explanation →

An ML team is moving from a prototype Jupyter notebook to a production training pipeline. They want to ensure reproducibility. Which approach should they take?

Question 45mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A data scientist trained a model on a single GPU but needs to train on multiple GPUs for a larger dataset. They observe that training time does not decrease linearly with additional GPUs. Which common issue is most likely?

Question 46mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A company uses Vertex AI for training. They have a large dataset stored in Cloud Storage and need to train a custom model using TensorFlow. The training job is failing with an out-of-memory error. What is the best first step?

Question 47hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A team is scaling their prototype inference model to handle high-throughput requests with low latency. They use a custom container on Vertex AI Prediction. They notice that latency spikes occur under heavy load. What is the most effective strategy?

Question 48hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A machine learning engineer is training a large-scale text classification model using a distributed strategy on TPUs. The training loss decreases normally but the validation loss starts increasing after a few epochs while training loss continues to decrease. The engineer suspects overfitting. Which technique is most appropriate to address this while scaling training?

Question 49easymulti select

Read the full Scaling prototypes into ML models explanation →

An ML team is converting a prototype model to a production pipeline using Vertex AI. They want to ensure model versioning and lineage. Which two practices should they adopt? (Select TWO)

Question 50mediummulti select

Read the full Scaling prototypes into ML models explanation →

A data scientist needs to scale a prototype deep learning model to train on a massive dataset using multiple GPUs. Which three strategies are essential for efficient distributed training? (Select THREE)

Question 51easymultiple choice

Read the full NAT/PAT explanation →

A company has developed a prototype fraud detection model using a small sample of transactions. The prototype runs on a single VM and uses a Random Forest classifier. They want to scale to the full dataset of 50 million transactions. The data is stored in BigQuery. The team wants to use Vertex AI for training. After moving the code to a custom training container and using Vertex AI Training with a single n1-standard-4 machine, the training job fails with an error: "Process terminated with exit code 1". The logs show: "java.lang.OutOfMemoryError: Java heap space". The model uses a scikit-learn RandomForest. Which course of action is most appropriate?

Question 52mediummultiple choice

Read the full Scaling prototypes into ML models explanation →

A team has a prototype image classification model trained on a small dataset using TensorFlow Keras on a single GPU. They need to train on a larger dataset (1 million images) using a distributed strategy on Vertex AI with 8 GPUs. They implement a MirroredStrategy for data parallelism. During the first few epochs, the training speed does not improve significantly compared to a single GPU, and GPU utilization is low. The data is stored as JPEG files in Cloud Storage, and the input pipeline uses tf.data with map to decode images. What is the most likely cause?

Question 53hardmultiple choice

Read the full NAT/PAT explanation →

A machine learning engineer is scaling a prototype natural language processing model that uses a transformer encoder. The prototype was trained on a small corpus on a single GPU. For production, they need to train on a much larger corpus using TPUs on Vertex AI. They convert the TensorFlow code to work with TPUStrategy. The training starts but after a few steps, the loss becomes NaN and training diverges. The learning rate scheduler uses a warm-up and then linear decay. The initial learning rate is 1e-4. The batch size per TPU core is 32, with 8 cores total (batch size 256). What is the most likely cause?

Question 54hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

A team has successfully trained a deep learning model on Vertex AI using a custom container and distributed training with TensorFlow. They want to serve this model for online predictions with low latency. They deploy the model to Vertex AI Endpoint with a single n1-standard-4 machine. During load testing, they observe that the median latency is 200ms, but the 99th percentile latency spikes to 2 seconds. The model is a complex neural network that takes variable-length text as input. Which approach will best reduce tail latency while maintaining throughput?

Question 55mediummulti select

Read the full Scaling prototypes into ML models explanation →

A data science team has trained a custom model using Vertex AI and wants to deploy it for online predictions with low latency. Which TWO actions should they take to optimize performance?

Question 56hardmultiple choice

Read the full Scaling prototypes into ML models explanation →

Refer to the exhibit. A Machine Learning Engineer attempts to deploy a model to a Vertex AI Endpoint for online predictions but receives an error. What is the most likely cause of this error?

Exhibit

gcloud ai endpoints deploy-model \
    --endpoint=projects/my-project/locations/us-central1/endpoints/456 \
    --model=projects/my-project/locations/us-central1/models/789 \
    --display-name=my-deployment \
    --machine-type=n1-standard-4 \
    --min-replica-count=0 \
    --max-replica-count=10 \
Error: (gcloud.beta.ai.endpoints.deploy-model) INVALID_ARGUMENT: min_replica_count must be at least 1 for online prediction.

Question 57easymultiple choice

Read the full Scaling prototypes into ML models explanation →

You are a Machine Learning Engineer at a financial services company. You have trained a large language model (LLM) using a custom container on Vertex AI Training. The model is used for sentiment analysis on financial news articles. You have deployed the model to a Vertex AI Endpoint for online prediction. However, during peak trading hours, users report high latency ( > 5 seconds) and occasional timeout errors. The model is deployed on n1-highmem-8 machines with 1 replica. You monitor the endpoint and see that CPU utilization is high ( > 90%) and memory is near capacity. The queries are relatively small text inputs. Which course of action should you take to reduce latency?

Refer to the exhibit. ``` Model accuracy: 0.92 Training data: 10,000 records Online prediction latency: 95th percentile = 450ms QPS: 50 After moving to production: - New data from users: 100,000 records/day - Data distribution shift detected (new features emerge) - Prediction latency increases to 95th percentile = 1200ms - QPS drops to 30 ```

Refer to the exhibit. { "name": "projects/my-project/locations/us-central1/endpoints/123456", "displayName": "my_endpoint", "deployedModels": [ { "id": "123", "model": "projects/my-project/locations/us-central1/models/456", "displayName": "model_v1", "automaticResources": { "minReplicaCount": 1, "maxReplicaCount": 10 }, "dedicatedResources": null, "enableAccessLogging": true } ] }

gcloud ai endpoints deploy-model \ --endpoint=projects/my-project/locations/us-central1/endpoints/456 \ --model=projects/my-project/locations/us-central1/models/789 \ --display-name=my-deployment \ --machine-type=n1-standard-4 \ --min-replica-count=0 \ --max-replica-count=10 \ Error: (gcloud.beta.ai.endpoints.deploy-model) INVALID_ARGUMENT: min_replica_count must be at least 1 for online prediction.