How many Scaling Prototypes into ML Models questions are on the PMLE exam?

The Scaling Prototypes into ML Models domain is one of the weighted domains on the PMLE exam. The Courseiva question bank has 99 practice questions for this domain.

Free PMLE Scaling Prototypes into ML Models Practice Questions (2026)

Q: What does the Scaling Prototypes into ML Models domain cover on the PMLE exam?

The Scaling Prototypes into ML Models domain covers the key concepts and skills tested in this area of the PMLE exam blueprint published by Google Cloud.

Q: How can I practice Scaling Prototypes into ML Models questions for PMLE?

Click any of the 99 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Scaling Prototypes into ML Models domain.

Practice Scaling Prototypes into ML Models questions

10Q 20Q 30Q 50Q

All PMLE Scaling Prototypes into ML Models questions (99)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

You have a TensorFlow training script that runs on a single machine. To speed up training on Vertex AI with 8 GPUs on a single machine, which strategy should you use?

A data science team is building a feature engineering pipeline that processes large-scale data from BigQuery daily. They need to compute aggregate features and store the results in Vertex AI Feature Store for both online serving and offline training. Which Google Cloud service is best suited for this batch computation?

You are fine-tuning a large language model (LLM) from Hugging Face Transformers using Vertex AI Training. The model has 7 billion parameters and does not fit into the memory of a single GPU. You need to train across multiple GPUs, splitting the model layers across devices. Which distributed training approach should you use?

A company is using Vertex AI Vizier for hyperparameter tuning of a model with 5 integer hyperparameters, each with a range of 10-100. They have a budget of 50 trials and want to maximize the chance of finding the best configuration. Which Vizier algorithm should they use?

You want to use a pre-trained model from TensorFlow Hub for image classification, but you need to adapt it to classify your own custom categories with a small dataset. Which Vertex AI approach is most appropriate?

Your Vertex AI custom training job is failing with an out-of-memory error on a single GPU. You need to reduce memory usage without changing the model architecture. Which approach should you try first?

You are deploying a deep learning model on edge devices with limited computational resources. The model must run inference in <10 ms and the model size must be under 50 MB. Currently, your trained model is 200 MB and runs in 50 ms. Which combination of model compression techniques should you apply?

You are running a Vertex AI custom training job with pre-built TensorFlow container. You want to use TPU v3 pods for faster training. Which configuration is required?

You need to perform a large-scale feature computation on streaming data from Pub/Sub, transforming raw events into features, and writing results to Vertex AI Feature Store for online serving. Which Google Cloud architecture is most appropriate?

You want to use Vertex AI JumpStart to quickly deploy a pre-built foundation model for text summarization. Which action is required?

Your PyTorch training script uses DistributedDataParallel (DDP) across 4 vertices each with 4 GPUs (16 GPUs total). You submit a Vertex AI custom training job. How should you configure the worker pool spec?

You are fine-tuning a pre-trained BERT model from Hugging Face on a custom text classification dataset using Vertex AI Training. You want to speed up training by using mixed precision. What should you do?

You are designing a distributed training job for a very large neural network that does not fit on a single machine. You need to split the model across multiple devices. Which TWO techniques can you use?

You are fine-tuning a large language model using Vertex AI Training with spot VMs to reduce cost. Your training job keeps getting preempted, causing delays. Which THREE strategies can help mitigate the impact of preemption?

You are building a machine learning pipeline on Google Cloud. You need to perform feature engineering on large datasets stored in BigQuery and store the resulting features in Vertex AI Feature Store for both online and offline use. Which TWO Google Cloud services should you use?

A data scientist wants to train a TensorFlow model on Vertex AI using a pre-built container. Which of the following pre-built containers is NOT available for custom training in Vertex AI?

You need to run a distributed training job on Vertex AI using TensorFlow with MirroredStrategy on a single machine with 4 GPUs. Which training configuration should you use?

You have a very large language model that does not fit on a single GPU. You need to train it efficiently across multiple GPUs on a single machine. Which approach should you use?

You want to reduce training costs by using preemptible VMs on Vertex AI for a fault-tolerant distributed training job that uses checkpointing. Which machine type should you choose in the worker pool configuration?

You are performing hyperparameter tuning on Vertex AI with Vizier. You want to maximize the accuracy of your model, and you have a budget of 50 trials. Which algorithm should you choose to best explore the search space?

You need to preprocess a large dataset (terabytes) for training a TensorFlow model. The preprocessing includes scaling and bucketizing features, and the same transformations must be applied during serving. Which tool should you use?

You are fine-tuning a pre-trained BERT model from Hugging Face for a sentiment analysis task using Vertex AI training. The dataset has 100k examples. To avoid catastrophic forgetting, which layer freezing strategy should you apply?

Which Vertex AI service allows you to discover, fine-tune, and deploy foundation models with a few clicks, including models like Llama and Gemma?

You want to deploy a trained scikit-learn model to Vertex AI for online predictions. The model file is 2 GB. Which option should you use?

You are designing a distributed training job on Vertex AI for a PyTorch model using DataDistributedParallel (DDP). You have 4 nodes, each with 4 GPUs. What is the total number of workers that should be configured in the TF_CONFIG equivalent for PyTorch?

You have an edge device with limited compute resources. You need to deploy a deep learning model for real-time inference. Which model compression technique should you apply to reduce the model size and latency with minimal accuracy loss?

You want to use Vertex AI Vizier for hyperparameter tuning. You have 2 categorical parameters and 3 continuous parameters. Which algorithm is best suited for this mixed parameter space?

You are using tf.Transform to preprocess data at scale. Which TWO services are required to run tf.Transform on Google Cloud? (Choose 2)

You need to reduce the cost of training a large model on Vertex AI while maintaining fault tolerance. Which THREE actions should you take? (Choose 3)

You are fine-tuning a Gemma model using Vertex AI JumpStart. You want to combine the fine-tuned model with a custom output layer for a unique task. Which TWO components are required to deploy the combined model? (Choose 2)

A data scientist needs to train a large PyTorch model on a custom dataset using Vertex AI. The training script expects data from Cloud Storage and uses GPU acceleration. Which option correctly configures a custom training job with a pre-built container for PyTorch and attaches a single NVIDIA V100 GPU?

A machine learning engineer is preparing to train a Transformer-based model using TensorFlow on a single TPU v3-8 pod slice. The training script uses tf.distribute.TPUStrategy. Which environment variable must be set in Vertex AI to enable TPU training with the appropriate topology?

A team wants to perform hyperparameter tuning on a Vertex AI custom training job with 100 trials. They require an algorithm that efficiently explores the search space by learning from previous trials. Which algorithm should they select in the study configuration?

A data engineering team needs to compute rolling window features (7-day average, 30-day sum) from a high-volume stream of e-commerce events stored in BigQuery. They must output the features to Vertex AI Feature Store for online serving. Which approach is MOST cost-effective and scalable?

A machine learning engineer wants to use Vertex AI Vizier to tune three hyperparameters: learning rate (log scale), number of layers (integer), and optimizer (categorical). They have 50 parallel trials available. Which parameter specification types should they define?

A team is fine-tuning a large language model (LLaMA 2) using Vertex AI with a custom container on a multi-node GPU cluster. They need to implement model parallelism to fit the model across multiple GPUs because it does not fit into a single GPU memory. Which distributed training strategy should they use?

An engineer is using TensorFlow Transform (tf.Transform) to preprocess training data. They want to ensure that the same preprocessing logic is applied during inference without code duplication. Which approach should they take?

A company wants to bring their own Docker container to Vertex AI for training a model with a custom framework. They need to ensure the container is compatible with the Vertex AI training service. What is the minimum requirement for the container?

A machine learning team is deploying a PyTorch model on Vertex AI Prediction for real-time inference. The model was trained with preprocessing that includes tokenization and normalization. They want to embed the preprocessing logic in the model to reduce prediction latency and avoid additional service calls. Which approach should they take?

A data scientist wants to quickly experiment with a pre-trained Vision Transformer model from Hugging Face and fine-tune it on a custom dataset using Vertex AI. They want to use a managed environment with minimal setup. Which Vertex AI service should they use?

A team is training a large TensorFlow model that requires more memory than a single GPU provides. They have access to multiple GPUs on a single machine. Which distributed training strategy should they use to split the model layers across GPUs?

A company is deploying a deep learning model on edge devices with limited storage and computational resources. They need to reduce the model size by 80% while maintaining acceptable accuracy. Which two techniques should they combine?

A company wants to train a custom machine learning model on Vertex AI using a pre-built container for scikit-learn. They want to use spot VMs to reduce costs. However, the training job fails intermittently due to preemption. Which TWO actions should they take to ensure the training job completes successfully?

A data scientist is training a very large neural network using Vertex AI with multiple GPUs across multiple nodes. The model does not fit on a single GPU, so they need to use both data parallelism and model parallelism (pipeline parallelism). Which THREE components or configurations are required to set up distributed training with Vertex AI?

A company is fine-tuning a large language model (Gemma 7B) using Vertex AI JumpStart. They want to reduce the model's memory footprint for deployment on edge devices. Which THREE model compression techniques should they consider?

A data scientist has a TensorFlow 2.x model trained on a single GPU. They want to scale training to multiple GPUs on a single Vertex AI machine without code changes. Which strategy should they use?

You are fine-tuning a BERT model from Hugging Face Transformers on Vertex AI. You want to minimise cost for a short experiment. Which compute configuration should you use?

An ML engineer is using Vertex AI Vizier to tune hyperparameters for a PyTorch model. They want to maximise the chance of finding the global optimum within a fixed trial budget of 50 trials. Which algorithm should they select?

Your team is deploying a large language model (LLM) on Vertex AI for online prediction. The model exceeds the maximum request size for Vertex AI Prediction. Which approach should you take to serve this model?

You are using TensorFlow Transform (tf.Transform) to preprocess data for a model that will be deployed on Vertex AI. What is the primary benefit of using tf.Transform over Dataflow alone?

An ML engineer wants to use Vertex AI Model Garden to deploy a pre-trained foundation model for text summarisation. What is the quickest way to achieve this?

Your team is training a very large transformer model that does not fit on a single GPU. They are using Vertex AI custom training with PyTorch. Which distributed training approach should they use?

You are performing post-training quantisation of a trained TensorFlow model to INT8 for deployment on edge devices. Which technique should you use to minimise accuracy loss?

An ML team is building a feature pipeline with Dataflow that reads from BigQuery, computes features, and writes to Vertex AI Feature Store. They need to ensure that features are available for both training and serving with low latency. Which Feature Store option should they use?

You need to run a custom training job on Vertex AI using a pre-built container for scikit-learn. Which container image should you specify?

You are fine-tuning a pre-trained model using transfer learning. The new dataset is small and very similar to the original training data. To avoid overfitting, which layer freezing strategy should you adopt?

You are using Vertex AI hyperparameter tuning with a custom container. The training job reports the objective metric but Vizier is not converging. Which configuration change could improve convergence?

You are designing a distributed training job for a PyTorch model on Vertex AI using multiple machines with GPUs. Which TWO configurations are required to enable data parallelism with PyTorch DDP? (Choose 2.)

Your team is deploying a large model on edge devices and needs to reduce its size by 80% while maintaining reasonable accuracy. Which THREE techniques should they consider? (Choose 3.)

You are using Vertex AI to train a model with a custom container. You need to pass command-line arguments for hyperparameters. Which TWO methods can you use? (Choose 2.)

A data scientist wants to train a PyTorch model on Vertex AI using a pre-built container for GPU training. She needs to use 4 NVIDIA A100 GPUs on a single machine. Which machine configuration should she select?

An ML engineer is using Vertex AI Vizier to tune hyperparameters for a custom training job. The training job takes 2 hours per trial. To speed up the process, the engineer wants to run 10 trials in parallel. What is the correct way to configure parallel trial execution?

A team is building a feature pipeline for an ML model. They need to compute aggregate features over a sliding time window from streaming data. Which Google Cloud service is most appropriate for this task?

An ML team is fine-tuning a large language model using a custom container on Vertex AI. They want to reduce costs by using preemptible (spot) VMs for training. The training job is long-running and uses checkpointing. Which statement is correct regarding spot VM usage?

A machine learning engineer is training a TensorFlow model on Vertex AI using distributed training with the MultiWorkerMirroredStrategy. The training job uses 4 workers with 4 GPUs each. The engineer notices that the training is not scaling linearly. What is the most likely cause?

An organization wants to deploy a pre-trained BERT model for sentiment analysis on Vertex AI. They want to fine-tune it on their domain-specific data. Which feature in Vertex AI allows them to find and fine-tune a suitable foundation model with minimal effort?

A machine learning engineer is deploying a TensorFlow model on an edge device with limited memory and compute. The model needs to perform inference with low latency. The engineer has a trained float32 model. Which model compression technique should be applied first to reduce the model size and improve inference speed without significant accuracy loss?

A data scientist wants to use tf.Transform for preprocessing a large dataset stored in BigQuery before training a TensorFlow model. The preprocessing should be consistent during training and serving. What is the correct way to use tf.Transform in this scenario?

A team is training a large image classification model using transfer learning from a pre-trained ResNet50. The model will be deployed on mobile devices. They want to fine-tune only the last few layers while keeping the earlier layers frozen. Which approach should they use?

An engineer is training a model on Vertex AI using a custom container. The training job fails with an error indicating that the container exited with a non-zero status. The engineer wants to debug the issue. What is the best way to access the logs?

A research team is training a very large Transformer model that does not fit into the memory of a single GPU. They have access to multiple GPUs on a single machine and want to split the model layers across GPUs. Which distributed training strategy should they use?

An ML team wants to use Vertex AI Hyperparameter Tuning to tune a custom training job. They have a budget of 50 trials and want to use an algorithm that balances exploration and exploitation. Which algorithm should they choose?

A company is deploying a TensorFlow model on Vertex AI Prediction. The model is memory-intensive and requires GPU acceleration. The team wants to minimize latency and cost. Which TWO configurations should they select? (Select 2)

An ML engineer is using Vertex AI for distributed training of a PyTorch model across multiple nodes. The training job must use TPUs for high throughput. The engineer sets up the job configuration. Which THREE components are required for the training to work correctly? (Select 3)

A machine learning team is building a feature engineering pipeline using Dataflow. They need to compute features from streaming data and store them in Vertex AI Feature Store for online serving. The features must be updated within 5 seconds of the event. Which TWO services should they combine? (Select 2)

A team is scaling a prototype ML model to production on Vertex AI. The model was developed using scikit-learn and requires custom preprocessing. They want to minimize operational overhead and ensure consistency between training and serving. Which approach should they use?

A data scientist is fine-tuning a large language model from Hugging Face using Vertex AI Training with a GPU. The model has 7 billion parameters and does not fit on a single GPU. They need to split the model across multiple GPUs and train with data parallelism. Which strategy should they use?

A company wants to use Vertex AI Vizier to tune hyperparameters for a PyTorch model. They have a limited budget of 50 training jobs. The objective metric is validation accuracy, and they want to find the best configuration efficiently. Which algorithm should they choose?

A data engineer wants to compute feature aggregates over a large dataset stored in BigQuery and write the results to Vertex AI Feature Store. The pipeline must handle both batch and streaming data. Which Google Cloud service should they use?

An ML team is using Vertex AI to train a deep learning model on a large dataset. To reduce costs, they want to use preemptible VMs for training jobs. However, training must complete within a bounded time. Which strategy should they use?

A developer wants to quickly deploy a pre-trained foundation model for text generation without writing any code. Which Vertex AI feature should they use?

A company has a TensorFlow model for image classification that must run on edge devices with limited memory. They need to reduce the model size without significant accuracy loss. Which technique should they use?

An ML engineer is using Vertex AI distributed training for a TensorFlow model that uses the MirroredStrategy. They notice that the training throughput drops significantly when moving from a single GPU to multiple GPUs on the same machine. What is the most likely cause?

A data scientist wants to use a pre-trained ResNet model from Keras Applications and fine-tune it on a small custom dataset. Which approach should they take to avoid overfitting?

A team is using TensorFlow Transform (tf.Transform) to create preprocessing functions that will be used both in training and serving. They want to ensure consistency. Which artifact should they save after analyzing the training data?

An organization wants to use Vertex AI JumpStart to fine-tune a foundation model for a custom classification task. They have a labeled dataset stored in BigQuery. Which steps should they take?

An ML engineer is training a very large PyTorch model on Vertex AI using a TPU v3 pod. The training is slower than expected, and the TPU utilization is low. What is the most likely cause?

An ML team is optimizing an inference model for deployment on edge devices. They need to reduce the model size and improve latency while maintaining accuracy as much as possible. Which two techniques should they use? (Choose TWO.)

A company wants to use Vertex AI for hyperparameter tuning. Which three components are required to configure a hyperparameter tuning job? (Choose THREE.)

An engineer is designing a distributed training job on Vertex AI for a TensorFlow model that uses the MultiWorkerMirroredStrategy. They need to ensure proper communication between workers. Which two environment variables must be set correctly for each worker? (Choose TWO.)

A machine learning team is training a large transformer model on Vertex AI. They need to reduce training time by utilizing multiple GPUs across nodes, but the model is too large to fit into a single GPU memory. Which distributed training strategy should they use?

You are deploying a pre-trained BERT model for inference on edge devices. The model must be under 500 MB and inference latency under 50 ms. Which approach should you take?

A data science team is building a real-time feature engineering pipeline for ML model training and serving. They need to compute features from streaming data, store them for low-latency serving, and ensure consistency between training and serving. Which TWO Google Cloud services should they use?

You are fine-tuning a large language model (LLM) from Vertex AI Model Garden using a custom dataset. You need to minimize training cost while maintaining reasonable throughput. Which THREE strategies should you combine?

A company wants to use Vertex AI JumpStart to deploy a pre-trained image classification model and later fine-tune it on their own data. Which TWO statements are true about Vertex AI JumpStart?

You are setting up a hyperparameter tuning job on Vertex AI for a large neural network. The objective is to minimize validation loss. You want to explore the hyperparameter space efficiently with a limited budget of 100 trials. Which THREE settings should you configure in the study?

A team is training a custom TensorFlow model on Vertex AI using a pre-built container. They need to use a TPU pod slice (v3-32). What THREE actions are required to set up the training job correctly?

A company is deploying a computer vision model on edge devices using TensorFlow Lite. They want to reduce model size without significant accuracy loss. Which TWO model compression techniques are most suitable?

You are using tf.Transform to preprocess data for a TensorFlow model. You want to ensure that the same transformations applied during training are also applied during serving. Which THREE components are necessary to achieve this?

Practice all 99 Scaling Prototypes into ML Models questions

Other PMLE exam domains

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Collaborating to manage data and models Solving business challenges with ML

Frequently asked questions

What does the Scaling Prototypes into ML Models domain cover on the PMLE exam?

The Scaling Prototypes into ML Models domain covers the key concepts tested in this area of the PMLE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PMLE domains — no account required.

How many Scaling Prototypes into ML Models questions are in the PMLE question bank?

The Courseiva PMLE question bank contains 99 questions in the Scaling Prototypes into ML Models domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Scaling Prototypes into ML Models for PMLE?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Scaling Prototypes into ML Models questions for PMLE?

Yes — the session launcher on this page draws questions exclusively from the Scaling Prototypes into ML Models domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your PMLE domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included