Back to Google Professional Machine Learning Engineer questions

Scenario-based practice

Troubleshooting Scenario Questions

Practise Google Professional Machine Learning Engineer practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

12
scenario questions
PMLE
exam code
Google Cloud
vendor

Scenario guide

How to approach troubleshooting scenario questions

These questions describe a network symptom and ask you to identify the root cause or the correct fix. They appear across all certification exams and reward systematic thinking over memorisation. The best candidates follow a consistent troubleshooting framework even under time pressure.

Quick answer

Troubleshooting Scenario Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related PMLE topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1easymultiple choice
Full question →

You have an online prediction model that is showing increasing prediction latency. You have already verified that the request rate and input data size are unchanged. Which of the following should you investigate next?

Question 2hardmultiple choice
Full question →

A data science team has trained a TensorFlow model on-premises using a large dataset. When they try to deploy the model to Vertex AI for online predictions, the deployed model fails to start with a ‘MemoryError’. The model artifact is 2 GB, and the machine type is n1-standard-4 (15 GB RAM). What is the most likely cause?

Question 3hardmulti select
Full question →

A company uses Vertex AI Model Monitoring to detect training-serving skew. They have a categorical feature 'product_category' with high cardinality. The monitoring job alerts for skew, but the data scientists believe the model performance is still acceptable. Which THREE actions should the team take to investigate and resolve the alert?

Question 4hardmultiple choice
Full question →

A Vertex AI pipeline is triggered from Cloud Build using the configuration above. The pipeline fails with an error: 'Unable to submit build: The source code is not available.' What is the most likely cause?

Network Topology
args: ['builds'config=./pipeline.yaml']Refer to the exhibit.```yaml# cloudbuild.yamltimeout: 600ssteps:- name: 'gcr.io/cloud-builders/docker'args: ['build', '-t', 'us-central1-docker.pkg.dev/my-project/my-repo/my-image:latest', '.']args: ['push', 'us-central1-docker.pkg.dev/my-project/my-repo/my-image:latest']- name: 'gcr.io/cloud-builders/gcloud'```
Question 5easymultiple choice
Read the full NAT/PAT explanation →

Refer to the exhibit. The team notices that the pipeline fails to read data from the specified Cloud Storage path. What is the most likely issue?

Exhibit

pipeline:
  execution_config:
    runner: DataflowRunner
    project: my-project
    region: us-central1
  components:
    - component_type: CsvExampleGen
      component_name: example_gen
      arguments:
        input_basedir: gs://my-bucket/data/
Question 6mediummultiple choice
Full question →

A data scientist trains an XGBoost model on Vertex AI with a custom container. The model performs well on a held-out test set but fails to generalize in production. They suspect data leakage between training and validation. What is the best practice to prevent this?

Question 7hardmultiple choice
Full question →

A company deploys a model to Vertex AI Prediction with autoscaling enabled. During a flash sale, traffic spikes 10x, but the endpoint fails to scale fast enough, causing high latency. What is the most likely cause and solution?

Question 8hardmulti select
Full question →

A team is troubleshooting a Vertex AI Pipelines run that keeps failing at the model evaluation step. The pipeline includes steps: data preprocessing, training, evaluation, and deployment. Which THREE actions should they take to diagnose the issue?

Question 9hardmultiple choice
Full question →

You are troubleshooting a Vertex AI endpoint for a customer. The exhibit shows the endpoint configuration. The customer reports that Model A is experiencing high latency during peaks. Model B runs fine. What is the most likely cause?

Exhibit

Refer to the exhibit.

{
  "name": "projects/my-project/locations/us-central1/endpoints/1234",
  "displayName": "my-endpoint",
  "dedicatedEndpointEnabled": false,
  "deployedModels": [
    {
      "id": "model-a-1",
      "displayName": "model-a",
      "model": "projects/my-project/locations/us-central1/models/456",
      "dedicatedResources": {
        "minReplicaCount": 1,
        "maxReplicaCount": 5,
        "machineSpec": {
          "machineType": "n1-standard-4",
          "acceleratorType": "NVIDIA_TESLA_T4",
          "acceleratorCount": 1
        }
      }
    },
    {
      "id": "model-b-1",
      "displayName": "model-b",
      "model": "projects/my-project/locations/us-central1/models/789",
      "dedicatedResources": {
        "minReplicaCount": 1,
        "maxReplicaCount": 5,
        "machineSpec": {
          "machineType": "n1-standard-8",
          "acceleratorType": "NVIDIA_TESLA_T4",
          "acceleratorCount": 2
        }
      }
    }
  ],
  "trafficSplit": {
    "model-a-1": 50,
    "model-b-1": 50
  }
}
Question 10hardmultiple choice
Full question →

A data engineer is troubleshooting a Vertex AI Endpoint that serves a large BERT model. After deployment, many prediction requests fail with 'Out of Memory' errors. The machine type is n1-standard-8 (30 GB memory) with no accelerator. Which action will most likely resolve the issue?

Question 11mediummultiple choice
Full question →

A model deployed on Vertex AI Prediction is returning high latency for real-time requests. The model is a small TensorFlow model. Which troubleshooting step should the team take first?

Question 12easymultiple choice
Full question →

A machine learning model deployed on Vertex AI is returning erroneous predictions. The team needs to investigate the root cause by examining the prediction request and response details. Which Google Cloud tool is best suited for this?

These PMLE practice questions are part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style PMLE questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.