PCDOE · topic practice

Implementing service monitoring strategies practice questions

Practise Google Professional Cloud DevOps Engineer Implementing service monitoring strategies practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Implementing service monitoring strategies

What the exam tests

What to know about Implementing service monitoring strategies

Implementing service monitoring strategies questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Implementing service monitoring strategies exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Implementing service monitoring strategies questions

20 questions · select your answer, then reveal the explanation

A team is monitoring a production service on Google Kubernetes Engine (GKE) and notices that a deployment is occasionally returning HTTP 503 errors. The team has set up a ServiceMonitor in Prometheus to scrape metrics from the pods. What is the most likely cause of the intermittent 503 errors?

Question 2mediummultiple choice
Read the full NAT/PAT explanation →

A cloud operations team is implementing monitoring for a microservices application deployed on Compute Engine. They want to create a custom dashboard in Cloud Monitoring that shows the 99th percentile latency of a specific service over the last hour. Which combination of Cloud Monitoring features should they use?

Question 3hardmultiple choice
Review the full routing breakdown →

An e-commerce platform is using Cloud Load Balancing with a backend service that has a custom health check. The health check is failing intermittently, causing traffic to be routed away from healthy instances. The team has enabled Cloud Logging and wants to diagnose the issue. Which log view should they examine to see the health check probe results?

A DevOps engineer is setting up alerting policies for a critical API service. They want to receive an alert if the error rate exceeds 5% for at least 5 minutes, but only during business hours (9 AM to 5 PM). Which approach should they use?

A company is running a stateful workload on Compute Engine and has configured a TCP health check on port 8080. The health check is failing, but the application is running and responding on port 8080 when tested manually from within the instance. What is the most likely cause of the health check failure?

Which TWO of the following are best practices for implementing service monitoring in Google Cloud? (Choose 2)

Which THREE of the following are valid approaches to monitor a custom application metric in Cloud Monitoring? (Choose 3)

A DevOps engineer runs the command above and gets the output shown. What does this output indicate?

Network Topology
gcloud logging read "resource.type=gce_instancelimit 5format="json"Refer to the exhibit.```"insertId": "abc123","jsonPayload": {},"resource": {"type": "gce_instance","labels": {"instance_id": "1234567890""severity": "ERROR","timestamp": "2024-03-15T10:30:00Z"

A team has deployed a Prometheus server on GKE using the configuration above. They expect Prometheus to scrape metrics from pods with the label 'app: my-app' and the annotation 'prometheus.io/scrape: true' on port 8080. However, no metrics are being collected. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
# prometheus.yml
scrape_configs:
  - job_name: 'my-app'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: my-app
      action: keep
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      regex: "true"
      action: keep
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+)
      replacement: $1:8080
```

You are monitoring a microservices application deployed on Google Kubernetes Engine (GKE) that uses Cloud Monitoring for observability. You notice that the error rate for a critical service has increased, but the CPU and memory usage remain normal. The service uses gRPC and logs are structured. Which Cloud Monitoring tool should you use first to diagnose the root cause of the increased error rate?

A company uses Cloud Monitoring to track latency for a multi-region web application. The SLO is 99.9% of requests under 500ms over a 30-day rolling window. The error budget has been rapidly depleting over the last week. The operations team wants to understand the impact of recent deployments. Which approach should they use to correlate deployment changes with latency spikes?

You are setting up alerting for a batch processing job that runs daily on Compute Engine. The job must complete within 2 hours. Which metric and alert condition should you use to ensure you are notified if the job is still running after 90 minutes?

Which TWO metrics should be included in a comprehensive monitoring strategy for a production Kubernetes workload to detect performance degradation and capacity issues?

Your organization runs a critical e-commerce platform on Google Kubernetes Engine (GKE). The platform uses Cloud Service Mesh (Anthos Service Mesh) for traffic management and Cloud Monitoring for observability. Recently, after a new release, you observe that the p99 latency of the checkout service has increased from 200ms to 2s. The service's CPU and memory metrics appear normal, and there are no error logs. The release included a change to the Istio VirtualService configuration that added a retry policy: 3 retries with a 500ms timeout per retry. You suspect that the retries are contributing to the latency increase. You want to use Cloud Monitoring to confirm this hypothesis. Which approach should you take?

You are a DevOps engineer for a SaaS company that provides a REST API. The API is deployed on Google Cloud Run. You have configured Cloud Monitoring alerts for 5xx errors. Recently, you received an alert that the error rate exceeded 5% for 5 minutes. You investigated and found that the errors were HTTP 503 (Service Unavailable) from a specific endpoint. The endpoint calls an internal Cloud SQL database. The database CPU utilization was at 90% during that period. You suspect the database is the bottleneck. Which action should you take to reduce the error rate without over-provisioning?

A company uses Cloud Run for a critical service and needs to set up alerting for 5xx errors. They want to receive a notification within 1 minute of the error rate exceeding 1% for any 1-minute window. Which alerting approach should they use?

Which TWO are best practices for implementing service monitoring strategies in Google Cloud?

A team has set up the alerting policies shown in the exhibit. They receive an alert for High Memory but not for High CPU. What is the most likely reason?

Exhibit

Refer to the exhibit.

```
{
  "alertPolicies": [
    {
      "displayName": "High CPU Alert",
      "combiner": "OR",
      "conditions": [
        {
          "displayName": "CPU usage > 80%",
          "conditionThreshold": {
            "filter": "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" resource.type=\"gce_instance\"",
            "comparison": "COMPARISON_GT",
            "thresholdValue": 0.8,
            "duration": "300s",
            "trigger": {
              "count": 1
            }
          }
        }
      ]
    },
    {
      "displayName": "High Memory Alert",
      "conditions": [
        {
          "displayName": "Memory usage > 90%",
          "conditionThreshold": {
            "filter": "metric.type=\"agent.googleapis.com/memory/percent_used\" resource.type=\"gce_instance\"",
            "comparison": "COMPARISON_GT",
            "thresholdValue": 0.9,
            "duration": "60s",
            "trigger": {
              "count": 1
            }
          }
        }
      ]
    }
  ]
}
```

Order the steps to configure a VPC Network Peering between two projects.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5

Match each Google Cloud tool to its function in incident management.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

End-to-end incident lifecycle tool

Third-party alerting and on-call scheduling

Asynchronous messaging for event-driven alerts

Serverless automation for incident response

Containerized event-driven applications

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Implementing service monitoring strategies sessions

Start a Implementing service monitoring strategies only practice session

Every question in these sessions is drawn from the Implementing service monitoring strategies domain — nothing else.

Related practice questions

Related PCDOE topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PCDOE exam test about Implementing service monitoring strategies?
Implementing service monitoring strategies questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Implementing service monitoring strategies questions in a focused session?
Yes — the session launcher on this page draws every question from the Implementing service monitoring strategies domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PCDOE topics?
Use the topic links above to move to related areas, or go back to the PCDOE question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PCDOE exam covers. They are not copied from any real exam or dump site.