How should I use these Serving and Scaling Models practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

Can I practise just Serving and Scaling Models questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the Serving and Scaling Models domain.

PMLE · topic practice

Serving and Scaling Models practice questions

Practise Google Professional Machine Learning Engineer Serving and Scaling Models practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: Serving and Scaling Models

Practice 10 questions Browse domain →

What the exam tests

What to know about Serving and Scaling Models

Serving and Scaling Models questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Serving and Scaling Models exam traps

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Practice set

Serving and Scaling Models questions

20 questions · select your answer, then reveal the explanation

Question 1easymultiple choice

Read the full Serving and Scaling Models explanation →

A data scientist wants to deploy a trained TensorFlow model to Vertex AI for online predictions. They need to serve predictions with low latency and want to leverage GPU acceleration. Which machine type should they select when creating the Vertex AI endpoint?

Trap 1: n1-standard-4

n1-standard is a CPU-only machine type, does not support GPU.

Trap 2: e2-standard-4

e2-series machines do not support GPUs.

Trap 3: n1-highmem-8

n1-highmem is CPU-only, no GPU support.

Study all Serving and Scaling Models common traps →

A
n1-standard-4 with 1 NVIDIA Tesla T4
Attaching a GPU to an n1-standard machine enables GPU acceleration.
B
n1-standard-4
Why wrong: n1-standard is a CPU-only machine type, does not support GPU.
C
e2-standard-4
Why wrong: e2-series machines do not support GPUs.
D
n1-highmem-8
Why wrong: n1-highmem is CPU-only, no GPU support.

Serving and Scaling Models practice questions

What to know about Serving and Scaling Models

Common Serving and Scaling Models exam traps

Serving and Scaling Models questions

A data scientist wants to deploy a trained TensorFlow model to Vertex AI for online predictions. They need to serve predictions with low latency and want to leverage GPU acceleration. Which machine type should they select when creating the Vertex AI endpoint?

You are deploying a new version of a model to a Vertex AI endpoint that already has a champion model serving 100% of traffic. You want to gradually shift traffic to the new version while monitoring for errors. Which approach should you use?

A company is using Vertex AI Prediction with a custom container that performs preprocessing before inference. The preprocessing step is CPU-intensive and the inference step uses a GPU. They want to minimize prediction latency while optimizing cost. Which architecture should they use?

You need to serve a large embedding model for similarity search with low latency. The model was trained to generate 256-dimensional embeddings. You plan to use Vertex AI Vector Search. Which index type should you choose to balance accuracy and performance for a dataset with 10 million vectors?

A machine learning engineer needs to run batch predictions on 50 TB of data stored in BigQuery using a Vertex AI model. The model is a custom container. What is the most efficient way to set up the batch prediction job?

You have a Vertex AI endpoint with min_replica_count=2 and max_replica_count=10. You notice that during a traffic spike, the endpoint does not scale up quickly enough, causing increased latency. What should you do to improve autoscaling responsiveness?

You are deploying a PyTorch model on Vertex AI using a custom container with NVIDIA Triton Inference Server. The model is a large transformer that requires GPU. You want to optimize GPU utilization and reduce memory footprint. Which technique should you apply?

A company wants to cache predictions for identical requests to reduce latency and cost. They use Vertex AI Prediction with a custom container. Which GCP service should they use to implement prediction caching?

You have a Vertex AI endpoint that serves a model for real-time predictions. You want to update the model to a new version with zero downtime. Which approach should you take?

You are using Vertex AI Vector Search with an approximate nearest neighbor index. You need to update the index with new data every hour. The updates must be available for queries immediately. Which update method should you use?

An ML team wants to deploy multiple models (e.g., a recommender and a classifier) behind a single Vertex AI endpoint. The models have different resource requirements: the recommender needs GPU, the classifier needs high memory. How should they configure the endpoint?

You need to run a batch prediction job on Vertex AI using a model that requires custom preprocessing using a Python script. The preprocessing must be applied before inference. Which approach should you use?

A company is deploying a model on Vertex AI for online predictions with strict latency SLOs. The model requires GPU acceleration. Which TWO configurations should they consider to meet the SLOs while optimizing cost?

An organization wants to deploy a model on edge devices (e.g., Android phones) for offline inference. They trained a model using TensorFlow. Which THREE steps should they take to prepare and deploy the model?

You deployed a model to a Vertex AI endpoint with minReplicas=0 and maxReplicas=5. After sending prediction requests, you notice the endpoint takes about 30 seconds to respond initially, but subsequent requests are fast. What is the most likely cause?

You have a champion model serving 100% traffic on a Vertex AI endpoint. You want to deploy a challenger model and gradually shift 10% of traffic to it for A/B testing. What is the correct approach?

You need to run batch predictions on 10 TB of text data stored in BigQuery using a custom container model hosted in Vertex AI. What is the most cost-effective and simple approach?

Your team is deploying a large recommendation model on Vertex AI endpoints using GPUs. You need to minimise latency while optimising cost. The model serves many similar requests from the same users within short time windows. Which additional service would best reduce latency and cost?

You want to deploy a TensorFlow model to a Vertex AI endpoint and enable online predictions. The model requires GPU for inference. Which machine type should you select when deploying the model?

Track your progress over time

Start a Serving and Scaling Models only practice session

Related PMLE topic practice pages

Automating and Orchestrating ML Pipelines practice questions

Collaborating Within and Across Teams to Manage Data and Models practice questions

Serving and Scaling Models practice questions

Monitoring ML Solutions practice questions

Architecting Low-Code ML Solutions practice questions

Scaling Prototypes into ML Models practice questions

Collaborating to manage data and models practice questions

Solving business challenges with ML practice questions

PMLE fundamentals practice questions

PMLE scenario practice questions

PMLE troubleshooting practice questions

Frequently asked questions

Track your progress

Study resources

Exam traps to avoid