PMLE · topic practice

Scaling prototypes into ML models practice questions

Practise Google Professional Machine Learning Engineer Scaling prototypes into ML models practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Scaling prototypes into ML models

What the exam tests

What to know about Scaling prototypes into ML models

Scaling prototypes into ML models questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Scaling prototypes into ML models exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Scaling prototypes into ML models questions

20 questions · select your answer, then reveal the explanation

A startup has developed a prototype ML model using scikit-learn on a single machine. They now need to scale it to handle larger datasets and deploy it for real-time predictions. The team is small and wants minimal operational overhead. Which Google Cloud service should they use?

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

A company has a prototype ML model that works well on historical data, but when deployed to production, the model performance degrades over time. The data distribution shifts gradually. Which strategy should they implement to maintain model accuracy?

An ML engineer is scaling a prototype to production using Vertex AI Pipelines. The pipeline includes data validation, preprocessing, training, and deployment steps. They want to ensure that the pipeline can be reproduced and audited. What is the best practice?

A team has trained a sentiment analysis model using PyTorch on Vertex AI Training. They now want to deploy it for online predictions with low latency. Which TWO actions should they take? (Choose 2)

A company has a prototype ML model that predicts equipment failure. They want to deploy it to production using Vertex AI. The model must be retrained weekly with new data. They also need to monitor for data drift and model performance. Which THREE components should they include in their MLOps pipeline? (Choose 3)

An ML engineer is trying to upload a TensorFlow model to Vertex AI using the gcloud command shown. The model was trained using TensorFlow 2.11 and saved with model.save('model/'). The engineer sees the error. What is the most likely cause?

Network Topology
region=us-central1display-name=my_modelcontainer-image-uri=us-docker.pkg.dev/cloud-aiplatform/prediction/tf2-cpu.2-11:latestartifact-uri=gs://my-bucket/modelcontainer-ports=8501Refer to the exhibit.```Deploying model...

You are an ML engineer at a fintech company. You have a prototype credit risk model built using XGBoost that achieves high accuracy on historical data. The model is trained on a dataset with 500,000 rows and 50 features. The company wants to deploy this model to production to score loan applications in real-time. The production environment must handle a peak load of 100 requests per second with a latency under 200ms. You have decided to use Vertex AI for deployment. After deploying the model as a Vertex AI endpoint with a single n1-standard-4 machine, you notice that latency exceeds 500ms at peak load and some requests time out. You have verified that the model prediction itself (excluding network overhead) takes about 50ms on average. What should you do to meet the latency and throughput requirements?

A machine learning team has a prototype using a custom TensorFlow model trained on a small dataset stored in Cloud Storage. They want to scale the prototype to production with minimal code changes while ensuring the model can handle increased traffic and new data. The model currently loads data using tf.data.Dataset from CSV files. Which approach best meets these requirements?

Which TWO actions are best practices when scaling a prototype ML model to production in Google Cloud?

A team deployed a prototype classification model to Vertex AI Prediction. After a week, they notice the metrics shown in the exhibit. What is the most likely cause of the performance degradation and latency increase?

Exhibit

Refer to the exhibit.

```
Model accuracy: 0.92
Training data: 10,000 records
Online prediction latency: 95th percentile = 450ms
QPS: 50

After moving to production:
- New data from users: 100,000 records/day
- Data distribution shift detected (new features emerge)
- Prediction latency increases to 95th percentile = 1200ms
- QPS drops to 30
```

Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5

Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5

Match each ML acronym to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Area Under the ROC Curve

Mean Squared Error

Tensor Processing Unit

Support Vector Machine

Principal Component Analysis

Match each ML model interpretability method to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Game-theoretic approach to explain feature contributions

Local surrogate model to explain individual predictions

Ranking features by their impact on model output

Shows marginal effect of a feature on predictions

Measures decrease in performance when feature is shuffled

A team has a trained TensorFlow model running locally and wants to deploy it for low-latency online predictions on Google Cloud. Which service should they use?

Question 17mediummultiple choice
Study the full Python automation breakdown →

An ML team is scaling a prototype to production. The data pipeline currently reads from Cloud Storage and transforms data with a custom Python script. They need to handle higher throughput and add monitoring. Which approach should they take?

A company has a prototype ML model that achieves 85% accuracy on historical data. In production, accuracy drops to 70% after two weeks due to data drift. They need an automated retraining pipeline with minimal manual oversight. Which solution is most cost-effective?

A team prototypes a recommendation model using a Jupyter notebook on Vertex AI Workbench. They want to productionize the model with CI/CD. Which approach should they use to package the model for deployment?

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Scaling prototypes into ML models sessions

Start a Scaling prototypes into ML models only practice session

Every question in these sessions is drawn from the Scaling prototypes into ML models domain — nothing else.

Related practice questions

Related PMLE topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PMLE exam test about Scaling prototypes into ML models?
Scaling prototypes into ML models questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Scaling prototypes into ML models questions in a focused session?
Yes — the session launcher on this page draws every question from the Scaling prototypes into ML models domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PMLE topics?
Use the topic links above to move to related areas, or go back to the PMLE question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PMLE exam covers. They are not copied from any real exam or dump site.