How many Scaling prototypes into ML models questions are on the PMLE exam?

The Scaling prototypes into ML models domain is one of the weighted domains on the PMLE exam. The Courseiva question bank has 57 practice questions for this domain.

Free PMLE Scaling prototypes into ML models Practice Questions (2026)

Q: How can I practice Scaling prototypes into ML models questions for PMLE?

Click any of the 57 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Scaling prototypes into ML models domain.

Practice Scaling prototypes into ML models questions

10Q 20Q 30Q 50Q

All PMLE Scaling prototypes into ML models questions (57)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A startup has developed a prototype ML model using scikit-learn on a single machine. They now need to scale it to handle larger datasets and deploy it for real-time predictions. The team is small and wants minimal operational overhead. Which Google Cloud service should they use?

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

A company has a prototype ML model that works well on historical data, but when deployed to production, the model performance degrades over time. The data distribution shifts gradually. Which strategy should they implement to maintain model accuracy?

An ML engineer is scaling a prototype to production using Vertex AI Pipelines. The pipeline includes data validation, preprocessing, training, and deployment steps. They want to ensure that the pipeline can be reproduced and audited. What is the best practice?

A team has trained a sentiment analysis model using PyTorch on Vertex AI Training. They now want to deploy it for online predictions with low latency. Which TWO actions should they take? (Choose 2)

A company has a prototype ML model that predicts equipment failure. They want to deploy it to production using Vertex AI. The model must be retrained weekly with new data. They also need to monitor for data drift and model performance. Which THREE components should they include in their MLOps pipeline? (Choose 3)

An ML engineer is trying to upload a TensorFlow model to Vertex AI using the gcloud command shown. The model was trained using TensorFlow 2.11 and saved with model.save('model/'). The engineer sees the error. What is the most likely cause?

You are an ML engineer at a fintech company. You have a prototype credit risk model built using XGBoost that achieves high accuracy on historical data. The model is trained on a dataset with 500,000 rows and 50 features. The company wants to deploy this model to production to score loan applications in real-time. The production environment must handle a peak load of 100 requests per second with a latency under 200ms. You have decided to use Vertex AI for deployment. After deploying the model as a Vertex AI endpoint with a single n1-standard-4 machine, you notice that latency exceeds 500ms at peak load and some requests time out. You have verified that the model prediction itself (excluding network overhead) takes about 50ms on average. What should you do to meet the latency and throughput requirements?

A machine learning team has a prototype using a custom TensorFlow model trained on a small dataset stored in Cloud Storage. They want to scale the prototype to production with minimal code changes while ensuring the model can handle increased traffic and new data. The model currently loads data using tf.data.Dataset from CSV files. Which approach best meets these requirements?

Which TWO actions are best practices when scaling a prototype ML model to production in Google Cloud?

A team deployed a prototype classification model to Vertex AI Prediction. After a week, they notice the metrics shown in the exhibit. What is the most likely cause of the performance degradation and latency increase?

Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.

Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.

Match each ML acronym to its definition.

Match each ML model interpretability method to its description.

A team has a trained TensorFlow model running locally and wants to deploy it for low-latency online predictions on Google Cloud. Which service should they use?

An ML team is scaling a prototype to production. The data pipeline currently reads from Cloud Storage and transforms data with a custom Python script. They need to handle higher throughput and add monitoring. Which approach should they take?

A company has a prototype ML model that achieves 85% accuracy on historical data. In production, accuracy drops to 70% after two weeks due to data drift. They need an automated retraining pipeline with minimal manual oversight. Which solution is most cost-effective?

A team prototypes a recommendation model using a Jupyter notebook on Vertex AI Workbench. They want to productionize the model with CI/CD. Which approach should they use to package the model for deployment?

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

A company deploys a model to Vertex AI Prediction with autoscaling enabled. During a flash sale, traffic spikes 10x, but the endpoint fails to scale fast enough, causing high latency. What is the most likely cause and solution?

A team just moved a model from prototype to production using Vertex AI. They notice prediction errors for certain inputs that were not present in training data. What should they do to detect such issues automatically?

An ML engineer needs to run batch predictions on tens of petabytes of data using a trained model. The data is stored in Cloud Storage. Which service should they choose?

A team uses Vertex AI Pipelines to automate training and deployment. They need to ensure that only models that pass a set of quality checks (e.g., accuracy > 0.9, latency < 100ms) are deployed to production. How should they implement this?

Which TWO practices are important when scaling a prototype ML model to production on Google Cloud? (Choose two.)

Which TWO services are commonly used together to implement an end-to-end ML pipeline that automatically retrains and deploys models on Vertex AI? (Choose two.)

Which THREE factors should be considered when choosing a compute option for serving a deep learning model in production on Google Cloud? (Choose three.)

A data scientist has trained a scikit-learn model locally and wants to deploy it to Vertex AI for online predictions with low latency. The model is a small RandomForestClassifier (100 MB). What is the recommended way to deploy this model?

A team deploys a PyTorch model on Vertex AI for online predictions. They notice that after deployment, the latency increases over time, especially during peak hours. The model is served using a custom container. What is the most likely cause?

A company has a large-scale ML system that uses Vertex AI Pipelines to retrain models weekly. The pipeline includes a custom training job and a batch prediction step. After moving to production, they observe that batch prediction jobs often fail with 'Quota exceeded' errors. The project has sufficient CPU quota. What is the most likely cause?

An ML engineer needs to monitor a deployed model for data drift. They want to compare the distribution of incoming predictions against a baseline distribution. Which Vertex AI service should they use?

A team uses Vertex AI Feature Store to serve features for online predictions. They notice that the online serving latency is high for certain features. The features are stored in a BigQuery source with high cardinality. What is the best practice to reduce latency?

A large e-commerce company deploys a recommendation model on Vertex AI with autoscaling enabled. During Black Friday, traffic spikes rapidly. The autoscaler adds new instances, but new instances take several minutes to become ready (cold start). As a result, many requests time out. What should they do to mitigate this issue?

A machine learning engineer is exporting a trained model from Vertex AI Training to the Model Registry. Which artifact should they upload as the model artifact?

A company has a TensorFlow model that uses custom operations compiled as .so files. They want to deploy it on Vertex AI for online predictions. The model runs correctly when loaded locally. However, on Vertex AI, the prediction fails with a 'Op type not registered' error. What is the most likely reason?

An organization runs a batch prediction job on Vertex AI for a large dataset (10 TB). The job is configured to use a cluster of 100 n1-standard-16 machines. Midway through, the job fails with 'Out of memory' errors. What is the most effective mitigation strategy?

An ML team is deploying a model to Vertex AI for the first time. Which THREE are best practices for scaling from prototype to production?

A company has a TensorFlow model that requires GPU for inference. They are deploying on Vertex AI. Which TWO configurations are necessary to ensure GPU is used?

A team is troubleshooting a Vertex AI Pipelines run that keeps failing at the model evaluation step. The pipeline includes steps: data preprocessing, training, evaluation, and deployment. Which THREE actions should they take to diagnose the issue?

An ML engineer runs this command to upload a model. The model artifact in Cloud Storage is a directory containing model.pkl and a custom preprocessing script. What will happen when he later deploys this model to an endpoint and sends a prediction request?

A team has deployed a model with autoscaling configured as shown. They notice that during off-peak hours, the endpoint consistently runs 3 instances instead of scaling down to 1. What is the most likely cause?

A team trains a distributed TensorFlow model using the config above. After training, they deploy the model for online predictions. The model returns poor quality predictions. They suspect that the model was not trained correctly due to a configuration error. What is the most likely mistake?

A team has developed a prototype of a recommendation model using a small dataset on a single VM. They need to scale to a larger dataset for production training. They plan to use Vertex AI training with a custom container. What is the best practice for handling the increased data volume?

An ML team is moving from a prototype Jupyter notebook to a production training pipeline. They want to ensure reproducibility. Which approach should they take?

A data scientist trained a model on a single GPU but needs to train on multiple GPUs for a larger dataset. They observe that training time does not decrease linearly with additional GPUs. Which common issue is most likely?

A company uses Vertex AI for training. They have a large dataset stored in Cloud Storage and need to train a custom model using TensorFlow. The training job is failing with an out-of-memory error. What is the best first step?

A team is scaling their prototype inference model to handle high-throughput requests with low latency. They use a custom container on Vertex AI Prediction. They notice that latency spikes occur under heavy load. What is the most effective strategy?

A machine learning engineer is training a large-scale text classification model using a distributed strategy on TPUs. The training loss decreases normally but the validation loss starts increasing after a few epochs while training loss continues to decrease. The engineer suspects overfitting. Which technique is most appropriate to address this while scaling training?

An ML team is converting a prototype model to a production pipeline using Vertex AI. They want to ensure model versioning and lineage. Which two practices should they adopt? (Select TWO)

A data scientist needs to scale a prototype deep learning model to train on a massive dataset using multiple GPUs. Which three strategies are essential for efficient distributed training? (Select THREE)

A company has developed a prototype fraud detection model using a small sample of transactions. The prototype runs on a single VM and uses a Random Forest classifier. They want to scale to the full dataset of 50 million transactions. The data is stored in BigQuery. The team wants to use Vertex AI for training. After moving the code to a custom training container and using Vertex AI Training with a single n1-standard-4 machine, the training job fails with an error: "Process terminated with exit code 1". The logs show: "java.lang.OutOfMemoryError: Java heap space". The model uses a scikit-learn RandomForest. Which course of action is most appropriate?

A team has a prototype image classification model trained on a small dataset using TensorFlow Keras on a single GPU. They need to train on a larger dataset (1 million images) using a distributed strategy on Vertex AI with 8 GPUs. They implement a MirroredStrategy for data parallelism. During the first few epochs, the training speed does not improve significantly compared to a single GPU, and GPU utilization is low. The data is stored as JPEG files in Cloud Storage, and the input pipeline uses tf.data with map to decode images. What is the most likely cause?

A machine learning engineer is scaling a prototype natural language processing model that uses a transformer encoder. The prototype was trained on a small corpus on a single GPU. For production, they need to train on a much larger corpus using TPUs on Vertex AI. They convert the TensorFlow code to work with TPUStrategy. The training starts but after a few steps, the loss becomes NaN and training diverges. The learning rate scheduler uses a warm-up and then linear decay. The initial learning rate is 1e-4. The batch size per TPU core is 32, with 8 cores total (batch size 256). What is the most likely cause?

A team has successfully trained a deep learning model on Vertex AI using a custom container and distributed training with TensorFlow. They want to serve this model for online predictions with low latency. They deploy the model to Vertex AI Endpoint with a single n1-standard-4 machine. During load testing, they observe that the median latency is 200ms, but the 99th percentile latency spikes to 2 seconds. The model is a complex neural network that takes variable-length text as input. Which approach will best reduce tail latency while maintaining throughput?

A data science team has trained a custom model using Vertex AI and wants to deploy it for online predictions with low latency. Which TWO actions should they take to optimize performance?

Refer to the exhibit. A Machine Learning Engineer attempts to deploy a model to a Vertex AI Endpoint for online predictions but receives an error. What is the most likely cause of this error?

You are a Machine Learning Engineer at a financial services company. You have trained a large language model (LLM) using a custom container on Vertex AI Training. The model is used for sentiment analysis on financial news articles. You have deployed the model to a Vertex AI Endpoint for online prediction. However, during peak trading hours, users report high latency ( > 5 seconds) and occasional timeout errors. The model is deployed on n1-highmem-8 machines with 1 replica. You monitor the endpoint and see that CPU utilization is high ( > 90%) and memory is near capacity. The queries are relatively small text inputs. Which course of action should you take to reduce latency?

Practice all 57 Scaling prototypes into ML models questions

Other PMLE exam domains

Automating and orchestrating ML pipelines Collaborating within and across teams to manage data and models Architecting low-code ML solutions Collaborating to manage data and models Serving and scaling models Monitoring ML solutions Solving business challenges with ML

Frequently asked questions

What does the Scaling prototypes into ML models domain cover on the PMLE exam?

The Scaling prototypes into ML models domain covers the key concepts tested in this area of the PMLE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PMLE domains — no account required.

How many Scaling prototypes into ML models questions are in the PMLE question bank?

The Courseiva PMLE question bank contains 57 questions in the Scaling prototypes into ML models domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Scaling prototypes into ML models for PMLE?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Scaling prototypes into ML models questions for PMLE?

Yes — the session launcher on this page draws questions exclusively from the Scaling prototypes into ML models domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your PMLE domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included