Back to Google Professional Cloud DevOps Engineer questions

Scenario-based practice

Troubleshooting Scenario Questions

Practise Google Professional Cloud DevOps Engineer practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

15
scenario questions
PCDOE
exam code
Google Cloud
vendor

Scenario guide

How to approach troubleshooting scenario questions

These questions describe a network symptom and ask you to identify the root cause or the correct fix. They appear across all certification exams and reward systematic thinking over memorisation. The best candidates follow a consistent troubleshooting framework even under time pressure.

Quick answer

Troubleshooting Scenario Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related PCDOE topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1mediummultiple choice
Full question →

An organization uses Cloud Build with a private pool to build container images that require access to on-premises Artifactory. After moving to a new VPC, builds fail with 'Connection refused' when fetching dependencies. What is the best step to troubleshoot?

Question 2mediummultiple choice
Full question →

Your company runs a multi-region application on Google Kubernetes Engine. You have implemented Cloud Monitoring dashboards to track cluster resource utilization and application SLIs. After a recent upgrade, you notice that the dashboard shows a sudden drop in CPU utilization for all nodes in one zone, but the application is still serving traffic normally. You suspect a monitoring issue. What should you investigate first?

Question 3hardmultiple choice
Full question →

A DevOps team is troubleshooting a Cloud Build pipeline that fails intermittently when building a container image. The build step uses a custom build step that runs a vulnerability scan. The error log shows: 'Step #1: Error: failed to scan image: context deadline exceeded'. The build configuration includes 'timeout: 600s'. Which is the most likely cause and solution?

Question 4easymultiple choice
Full question →

A team uses Cloud SQL for PostgreSQL. They receive an alert that the database's CPU utilization is above 95% for the past 30 minutes. Queries are taking longer than usual. They want to investigate without causing further impact. What should they do first?

Question 5hardmultiple choice
Full question →

A company uses Spinnaker for continuous delivery across multiple GKE clusters. After a recent infrastructure change, the 'Canary' deployment strategy fails during the 'disable' phase of the old version. The error log shows: 'Unable to disable server group: Not authorized to perform compute.instanceGroups.update.' What is the most likely root cause?

Question 6hardmulti select
Full question →

An incident is declared for a production service running on GKE. The on-call engineer suspects a recent code change may have introduced a memory leak. Which THREE actions should the engineer take to investigate and mitigate?

A team uses Google Kubernetes Engine (GKE) with cluster telemetry enabled. During an incident, they notice that a deployment's pods are repeatedly crashing with Exit Code 137. The team wants to investigate the root cause. Which two Google Cloud services should they use together to correlate resource usage and logs?

Question 8easymultiple choice
Full question →

An engineer receives an alert that a service's error rate has exceeded the threshold. To investigate, which log-based metric should the engineer query in Cloud Logging to identify the root cause?

Question 9mediummultiple choice
Full question →

You are a DevOps engineer for a SaaS company that provides a REST API. The API is deployed on Google Cloud Run. You have configured Cloud Monitoring alerts for 5xx errors. Recently, you received an alert that the error rate exceeded 5% for 5 minutes. You investigated and found that the errors were HTTP 503 (Service Unavailable) from a specific endpoint. The endpoint calls an internal Cloud SQL database. The database CPU utilization was at 90% during that period. You suspect the database is the bottleneck. Which action should you take to reduce the error rate without over-provisioning?

Question 10hardmultiple choice
Full question →

You are troubleshooting a performance issue with a Compute Engine instance that is part of a managed instance group serving a web application. Users report intermittent high latency. You run the command shown in the exhibit. Based on the output, what is the most likely cause of the performance issue?

Exhibit

Refer to the exhibit.

```
$ gcloud compute instances describe instance-1 --zone=us-central1-a
...
networkInterfaces:
- accessConfigs:
  - name: external-nat
    natIP: 34.123.45.67
    type: ONE_TO_ONE_NAT
  name: nic0
  network: https://www.googleapis.com/compute/v1/projects/my-project/global/networks/default
  subnetwork: https://www.googleapis.com/compute/v1/projects/my-project/regions/us-central1/subnetworks/default
...
disks:
- autoDelete: true
  boot: true
  deviceName: instance-1
  diskSizeGb: '100'
  interface: SCSI
  source: https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-a/disks/instance-1
  type: PERSISTENT
...
serviceAccounts:
- email: 123456789-compute@developer.gserviceaccount.com
  scopes:
  - https://www.googleapis.com/auth/devstorage.read_only
  - https://www.googleapis.com/auth/logging.write
  - https://www.googleapis.com/auth/monitoring.write
  - https://www.googleapis.com/auth/servicecontrol
  - https://www.googleapis.com/auth/service.management.readonly
  - https://www.googleapis.com/auth/trace.append
```
Question 11mediummulti select
Full question →

A team is troubleshooting a slow response time on an App Engine standard environment application. The application uses Cloud SQL as its database. Which TWO actions should the team take to identify the bottleneck?

Question 12mediumdrag order
Full question →

Arrange the steps to troubleshoot a high latency issue on a Google Cloud HTTP(S) Load Balancer.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 13easymultiple choice
Full question →

Refer to the exhibit. A budget alert has fired for project dev-123 indicating that the cost has exceeded the budget of $1000. What should the team do next to investigate the cost overrun?

Exhibit

{
  "budgetName": "projects/my-project/budgets/my-budget",
  "costAmount": 1500.00,
  "budgetAmount": 1000.00,
  "alertThresholdExceeded": 1.0,
  "budgetFilter": {
    "projects": ["projects/dev-123"],
    "creditTypesTreatment": "INCLUDE_ALL_CREDITS"
  }
}
Question 14hardmultiple choice
Full question →

A DevOps engineer is troubleshooting a Cloud Build failure. The build log shows the error: 'Permission denied for resource projects/my-project/locations/us-central1/repositories/my-repo'. The Cloud Build service account (PROJECT_NUMBER@cloudbuild.gserviceaccount.com) is used. What is the most likely missing role?

Question 15mediummultiple choice
Full question →

A DevOps engineer is troubleshooting a production incident where users are getting 502 errors from a Google Cloud HTTP(S) Load Balancer. The backend service is a GKE deployment. Initial checks show the backend pods are healthy and responding. What is the most likely cause?

These PCDOE practice questions are part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style PCDOE questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.