PCDOE · topic practice

Implementing service monitoring strategies practice questions

Q: How should I use these Implementing service monitoring strategies practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

Q: Can I practise just Implementing service monitoring strategies questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the Implementing service monitoring strategies domain.

Practise Google Professional Cloud DevOps Engineer Implementing service monitoring strategies practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: Implementing service monitoring strategies

Practice 10 questions Browse domain →

What the exam tests

What to know about Implementing service monitoring strategies

Implementing service monitoring strategies questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Implementing service monitoring strategies exam traps

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Practice set

Implementing service monitoring strategies questions

20 questions · select your answer, then reveal the explanation

Question 1easymultiple choice

Read the full Implementing service monitoring strategies explanation →

A team is monitoring a production service on Google Kubernetes Engine (GKE) and notices that a deployment is occasionally returning HTTP 503 errors. The team has set up a ServiceMonitor in Prometheus to scrape metrics from the pods. What is the most likely cause of the intermittent 503 errors?

Trap 1: The pods are crashing and restarting frequently.

Restarts would cause 503s but less likely than readiness probe failures.

Trap 2: The Prometheus scrape interval is too long, causing missed metrics.

Prometheus scraping does not affect pod availability.

Trap 3: The container resource limits are set too low, causing…

OOM errors cause restarts, not directly 503s.

Study all Implementing service monitoring strategies common traps →

A
The pods are crashing and restarting frequently.
Why wrong: Restarts would cause 503s but less likely than readiness probe failures.
B
The Prometheus scrape interval is too long, causing missed metrics.
Why wrong: Prometheus scraping does not affect pod availability.
C
The readiness probes are failing, causing the pods to be removed from the service endpoints.
Readiness probe failures remove pods from service endpoints, causing 503s if all replicas fail.
D
The container resource limits are set too low, causing out-of-memory errors.
Why wrong: OOM errors cause restarts, not directly 503s.

Full breakdown with real-world context →

Question 2mediummultiple choice

Read the full NAT/PAT explanation →

A cloud operations team is implementing monitoring for a microservices application deployed on Compute Engine. They want to create a custom dashboard in Cloud Monitoring that shows the 99th percentile latency of a specific service over the last hour. Which combination of Cloud Monitoring features should they use?

Trap 1: Use a gauge metric with the max alignment function in a Metrics…

Gauge metrics do not support percentiles.

Trap 2: Use an uptime check metric and configure the latency percentile in…

Uptime checks measure availability, not service latency.

Trap 3: Create a logs-based metric from application logs and use the count…

Logs-based metrics are for counts, not latency percentiles.

Study all Implementing service monitoring strategies common traps →

A
Use a gauge metric with the max alignment function in a Metrics Explorer chart.
Why wrong: Gauge metrics do not support percentiles.
B
Use a distribution metric with the 99th percentile alignment function in a Metrics Explorer chart.
Distribution metrics support percentile alignments like 99th percentile.
C
Use an uptime check metric and configure the latency percentile in the chart.
Why wrong: Uptime checks measure availability, not service latency.
D
Create a logs-based metric from application logs and use the count alignment.
Why wrong: Logs-based metrics are for counts, not latency percentiles.

Full breakdown with real-world context →

Question 3hardmultiple choice

Review the full routing breakdown →

An e-commerce platform is using Cloud Load Balancing with a backend service that has a custom health check. The health check is failing intermittently, causing traffic to be routed away from healthy instances. The team has enabled Cloud Logging and wants to diagnose the issue. Which log view should they examine to see the health check probe results?

Trap 1: VPC flow logs

VPC flow logs show packet-level traffic, not health check results.

Trap 2: Cloud Audit Logs (Admin Activity)

Audit logs track administrative actions, not health checks.

Trap 3: Instance serial port output logs

Serial port logs show OS-level boot and console messages.

Study all Implementing service monitoring strategies common traps →

A
VPC flow logs
Why wrong: VPC flow logs show packet-level traffic, not health check results.
B
Cloud Audit Logs (Admin Activity)
Why wrong: Audit logs track administrative actions, not health checks.
C
Instance serial port output logs
Why wrong: Serial port logs show OS-level boot and console messages.
D
Load balancer logs (type: 'loadbalancing.googleapis.com')
Load balancer logs contain health check probe results.

Full breakdown with real-world context →

Question 4mediummultiple choice

Read the full Implementing service monitoring strategies explanation →

A DevOps engineer is setting up alerting policies for a critical API service. They want to receive an alert if the error rate exceeds 5% for at least 5 minutes, but only during business hours (9 AM to 5 PM). Which approach should they use?

Trap 1: Create a log-based metric for errors and use a condition with a…

Alert policies do not have a built-in schedule for conditions.

Trap 2: Create two separate alert policies, one for business hours and one…

This would require manual management and not a single policy.

Trap 3: Use Cloud Scheduler to enable and disable the alerting policy at…

Cloud Scheduler cannot enable/disable alert policies directly.

Study all Implementing service monitoring strategies common traps →

A
Create a log-based metric for errors and use a condition with a threshold, then set the alert policy to only run during business hours using the 'condition' schedule.
Why wrong: Alert policies do not have a built-in schedule for conditions.
B
Create an alerting policy with a condition that triggers when the error rate is above 5% for 5 minutes, and configure the notification channel to only send notifications during business hours using a webhook receiver that checks time.
This approach uses a custom notification channel to filter by time.
C
Create two separate alert policies, one for business hours and one for off-hours, each with different thresholds.
Why wrong: This would require manual management and not a single policy.
D
Use Cloud Scheduler to enable and disable the alerting policy at the start and end of business hours.
Why wrong: Cloud Scheduler cannot enable/disable alert policies directly.

Full breakdown with real-world context →

Question 5hardmultiple choice

Read the full Implementing service monitoring strategies explanation →

A company is running a stateful workload on Compute Engine and has configured a TCP health check on port 8080. The health check is failing, but the application is running and responding on port 8080 when tested manually from within the instance. What is the most likely cause of the health check failure?

Trap 1: The health check is configured to use port 80 instead of port 8080.

The question says port 8080 is configured.

Trap 2: The instance's DNS resolution is failing, causing the health check…

Health checks use the internal IP, not DNS.

Trap 3: The health check response timeout is set too low (e.g., 1 second).

Default timeout is 5 seconds, usually enough.

Study all Implementing service monitoring strategies common traps →

A
The health check is configured to use port 80 instead of port 8080.
Why wrong: The question says port 8080 is configured.
B
The firewall rules are not allowing traffic from the health check probe IP ranges.
Health check probes use specific IP ranges that must be allowed.
C
The instance's DNS resolution is failing, causing the health check to use the wrong IP.
Why wrong: Health checks use the internal IP, not DNS.
D
The health check response timeout is set too low (e.g., 1 second).
Why wrong: Default timeout is 5 seconds, usually enough.

Full breakdown with real-world context →

Question 6mediummulti select

Read the full Implementing service monitoring strategies explanation →

Which TWO of the following are best practices for implementing service monitoring in Google Cloud? (Choose 2)

Trap 1: Set static alert thresholds without considering historical…

Thresholds should be based on baselines, not static values.

Trap 2: Use the USE method (Utilization, Saturation, Errors) for…

USE is for resource monitoring, not services.

Trap 3: Alert on cause-based metrics (e.g., CPU utilization) rather than…

Alert on symptoms, not causes.

Study all Implementing service monitoring strategies common traps →

A
Set static alert thresholds without considering historical baselines.
Why wrong: Thresholds should be based on baselines, not static values.
B
Use Cloud Monitoring uptime checks to verify that services are reachable from external locations.
Uptime checks verify external accessibility.
C
Use the USE method (Utilization, Saturation, Errors) for service-level monitoring.
Why wrong: USE is for resource monitoring, not services.
D
Define service level indicators (SLIs) using the RED method (Rate, Errors, Duration).
RED metrics are a best practice for service monitoring.
E
Alert on cause-based metrics (e.g., CPU utilization) rather than symptom-based metrics (e.g., latency).
Why wrong: Alert on symptoms, not causes.

Full breakdown with real-world context →

Question 7hardmulti select

Read the full Implementing service monitoring strategies explanation →

Which THREE of the following are valid approaches to monitor a custom application metric in Cloud Monitoring? (Choose 3)

Trap 1: Install the Stackdriver Monitoring agent on a Windows VM and…

The legacy Stackdriver agent does not support custom metrics on Windows.

Trap 2: Use the built-in JMX plugin in the Cloud Monitoring agent to…

Cloud Monitoring agent does not have a JMX plugin.

Study all Implementing service monitoring strategies common traps →

A
Install the Stackdriver Monitoring agent on a Windows VM and configure custom metric collection in the agent configuration file.
Why wrong: The legacy Stackdriver agent does not support custom metrics on Windows.
B
Use the Cloud Monitoring API to write time series data directly.
The API allows writing custom metrics.
C
Create a logs-based metric from application logs that contain the metric value.
Logs-based metrics can extract values from log entries.
D
Use the built-in JMX plugin in the Cloud Monitoring agent to collect Java application metrics.
Why wrong: Cloud Monitoring agent does not have a JMX plugin.
E
Use the OpenTelemetry Collector with the Google Cloud Monitoring exporter.
OpenTelemetry is a supported way to send metrics.

Full breakdown with real-world context →

Question 8easymultiple choice

Read the full Implementing service monitoring strategies explanation →

A DevOps engineer runs the command above and gets the output shown. What does this output indicate?

Network Topology

Trap 1: The instance's disk is full, causing write errors.

No disk-related message is present.

Trap 2: The instance failed to authenticate with the metadata server.

The log shows a connection timeout, not an authentication failure.

Trap 3: A health check probe failed to reach the instance.

Health check failures are logged in load balancer logs, not as instance logs.

Study all Implementing service monitoring strategies common traps →

A
The instance's disk is full, causing write errors.
Why wrong: No disk-related message is present.
B
An application running on the instance encountered a connection timeout to a backend service.
The log message explicitly states 'Connection timeout to backend service'.
C
The instance failed to authenticate with the metadata server.
Why wrong: The log shows a connection timeout, not an authentication failure.
D
A health check probe failed to reach the instance.
Why wrong: Health check failures are logged in load balancer logs, not as instance logs.

Full breakdown with real-world context →

Question 9mediummultiple choice

Read the full Implementing service monitoring strategies explanation →

A team has deployed a Prometheus server on GKE using the configuration above. They expect Prometheus to scrape metrics from pods with the label 'app: my-app' and the annotation 'prometheus.io/scrape: true' on port 8080. However, no metrics are being collected. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
# prometheus.yml
scrape_configs:
  - job_name: 'my-app'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: my-app
      action: keep
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      regex: "true"
      action: keep
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+)
      replacement: $1:8080
```

Trap 1: The kubernetes_sd_configs role is set to 'pod' but should be…

Role 'pod' is correct for scraping pods directly.

Trap 2: Prometheus needs to be configured to listen on port 9090 for…

Prometheus listens on 9090 for its UI, not for scraping; scraping uses the target port.

Trap 3: The keep action for the label 'my-app' is filtering out all pods.

If the label is correct, pods are kept.

Study all Implementing service monitoring strategies common traps →

A
The kubernetes_sd_configs role is set to 'pod' but should be 'endpoints'.
Why wrong: Role 'pod' is correct for scraping pods directly.
B
Prometheus needs to be configured to listen on port 9090 for scraping.
Why wrong: Prometheus listens on 9090 for its UI, not for scraping; scraping uses the target port.
C
The keep action for the label 'my-app' is filtering out all pods.
Why wrong: If the label is correct, pods are kept.
D
The relabel_config for port incorrectly constructs the target address; it should use the annotation value directly without appending ':8080'.
The port annotation usually includes the port number, so appending a fixed port is wrong.

Full breakdown with real-world context →

Question 10mediummultiple choice

Read the full Implementing service monitoring strategies explanation →

You are monitoring a microservices application deployed on Google Kubernetes Engine (GKE) that uses Cloud Monitoring for observability. You notice that the error rate for a critical service has increased, but the CPU and memory usage remain normal. The service uses gRPC and logs are structured. Which Cloud Monitoring tool should you use first to diagnose the root cause of the increased error rate?

Trap 1: Service Monitoring to create a custom dashboard

Service Monitoring focuses on SLOs and service-level dashboards, not log-level root cause analysis.

Trap 2: Error Reporting to automatically group error occurrences

Error Reporting summarizes errors but may not provide the granularity of gRPC status codes.

Trap 3: Metrics Explorer to view error rate and latency charts

Metrics Explorer shows aggregated metrics, not individual log entries needed for detailed error analysis.

Study all Implementing service monitoring strategies common traps →

A
Logs Explorer to filter logs by error status codes
Logs Explorer allows you to examine structured logs, including gRPC status codes, to find error patterns.
B
Service Monitoring to create a custom dashboard
Why wrong: Service Monitoring focuses on SLOs and service-level dashboards, not log-level root cause analysis.
C
Error Reporting to automatically group error occurrences
Why wrong: Error Reporting summarizes errors but may not provide the granularity of gRPC status codes.
D
Metrics Explorer to view error rate and latency charts
Why wrong: Metrics Explorer shows aggregated metrics, not individual log entries needed for detailed error analysis.

Full breakdown with real-world context →

Question 11hardmultiple choice

Read the full Implementing service monitoring strategies explanation →

A company uses Cloud Monitoring to track latency for a multi-region web application. The SLO is 99.9% of requests under 500ms over a 30-day rolling window. The error budget has been rapidly depleting over the last week. The operations team wants to understand the impact of recent deployments. Which approach should they use to correlate deployment changes with latency spikes?

Trap 1: Use Cloud Logging to search for deployment logs and manually…

Manual comparison is time-consuming and error-prone; a dashboard provides a unified view.

Trap 2: Use Cloud Trace to analyze latency distributions for each…

Cloud Trace is for distributed tracing, not for correlating deployments with latency trends over time.

Trap 3: Configure Error Reporting to alert on latency threshold breaches

Error Reporting handles application errors, not latency metrics.

Study all Implementing service monitoring strategies common traps →

A
Use Cloud Logging to search for deployment logs and manually compare with latency metrics
Why wrong: Manual comparison is time-consuming and error-prone; a dashboard provides a unified view.
B
Use Cloud Trace to analyze latency distributions for each deployment version
Why wrong: Cloud Trace is for distributed tracing, not for correlating deployments with latency trends over time.
C
Create a custom dashboard in Cloud Monitoring that includes latency charts and use annotation markers to indicate deployment times
Annotation markers allow you to overlay deployment events on time-series charts, making it easy to correlate changes with latency spikes.
D
Configure Error Reporting to alert on latency threshold breaches
Why wrong: Error Reporting handles application errors, not latency metrics.

Full breakdown with real-world context →

Question 12easymultiple choice

Read the full Implementing service monitoring strategies explanation →

You are setting up alerting for a batch processing job that runs daily on Compute Engine. The job must complete within 2 hours. Which metric and alert condition should you use to ensure you are notified if the job is still running after 90 minutes?

Trap 1: Alert on CPU utilization greater than 80% for the instance running…

CPU utilization may be high even after the job completes, or low during the job, leading to false negatives.

Trap 2: Use a heartbeat metric that reports every 5 minutes; alert if no…

No heartbeat for 90 minutes indicates a failure, not that the job is still running.

Trap 3: Set up a log-based metric that counts job completion log entries;…

This is indirect; you would need to ensure the log entry is written, and the condition is more complex.

Study all Implementing service monitoring strategies common traps →

A
Alert on CPU utilization greater than 80% for the instance running the job
Why wrong: CPU utilization may be high even after the job completes, or low during the job, leading to false negatives.
B
Create a custom metric that emits 1 when the job starts and 0 when it finishes; alert if the metric is 1 for more than 90 minutes
This directly measures job duration and triggers an alert if it exceeds 90 minutes.
C
Use a heartbeat metric that reports every 5 minutes; alert if no heartbeat for 90 minutes
Why wrong: No heartbeat for 90 minutes indicates a failure, not that the job is still running.
D
Set up a log-based metric that counts job completion log entries; alert if the count is zero after 90 minutes
Why wrong: This is indirect; you would need to ensure the log entry is written, and the condition is more complex.

Full breakdown with real-world context →

Question 13mediummulti select

Read the full Implementing service monitoring strategies explanation →

Which TWO metrics should be included in a comprehensive monitoring strategy for a production Kubernetes workload to detect performance degradation and capacity issues?

Trap 1: Disk read IOPS per pod

Disk IOPS is important for stateful workloads but not a general performance indicator for all workloads.

Trap 2: Number of nodes in the cluster

Node count is an infrastructure metric; it doesn't directly measure workload performance.

Trap 3: Network bytes received per second

Network throughput is not a direct measure of performance degradation; it may vary with traffic.

Study all Implementing service monitoring strategies common traps →

A
Disk read IOPS per pod
Why wrong: Disk IOPS is important for stateful workloads but not a general performance indicator for all workloads.
B
Container CPU utilization
High CPU utilization can indicate capacity pressure and performance issues.
C
Number of nodes in the cluster
Why wrong: Node count is an infrastructure metric; it doesn't directly measure workload performance.
D
Network bytes received per second
Why wrong: Network throughput is not a direct measure of performance degradation; it may vary with traffic.
E
Request latency percentiles (e.g., p99)
Latency percentiles directly reflect user experience and performance degradation.

Full breakdown with real-world context →

Question 14hardmultiple choice

Read the full Implementing service monitoring strategies explanation →

Your organization runs a critical e-commerce platform on Google Kubernetes Engine (GKE). The platform uses Cloud Service Mesh (Anthos Service Mesh) for traffic management and Cloud Monitoring for observability. Recently, after a new release, you observe that the p99 latency of the checkout service has increased from 200ms to 2s. The service's CPU and memory metrics appear normal, and there are no error logs. The release included a change to the Istio VirtualService configuration that added a retry policy: 3 retries with a 500ms timeout per retry. You suspect that the retries are contributing to the latency increase. You want to use Cloud Monitoring to confirm this hypothesis. Which approach should you take?

Trap 1: Use Cloud Trace to analyze distributed traces for the checkout…

Trace analysis can confirm retries but is more time-consuming than using metrics.

Trap 2: Check the 'Services' dashboard in Cloud Monitoring, which shows a…

The default dashboard may not include retry metrics, so it won't confirm the hypothesis.

Trap 3: Use Logs Explorer to search for logs containing 'retry' in the…

Retry logs may not be generated by default; they require explicit Istio logging configuration.

Study all Implementing service monitoring strategies common traps →

A
Use Cloud Trace to analyze distributed traces for the checkout service and look for retry spans
Why wrong: Trace analysis can confirm retries but is more time-consuming than using metrics.
B
Check the 'Services' dashboard in Cloud Monitoring, which shows a pre-built latency chart for all services
Why wrong: The default dashboard may not include retry metrics, so it won't confirm the hypothesis.
C
Use Metrics Explorer to query the istio.io/service/server/request_count metric, filtered by response_code_class and destination_service, and include the istio.io/service/server/request_retries metric to see retry counts alongside latency
This directly shows the correlation between retries and latency.
D
Use Logs Explorer to search for logs containing 'retry' in the checkout service namespace
Why wrong: Retry logs may not be generated by default; they require explicit Istio logging configuration.

Full breakdown with real-world context →

Question 15mediummultiple choice

Read the full Implementing service monitoring strategies explanation →

You are a DevOps engineer for a SaaS company that provides a REST API. The API is deployed on Google Cloud Run. You have configured Cloud Monitoring alerts for 5xx errors. Recently, you received an alert that the error rate exceeded 5% for 5 minutes. You investigated and found that the errors were HTTP 503 (Service Unavailable) from a specific endpoint. The endpoint calls an internal Cloud SQL database. The database CPU utilization was at 90% during that period. You suspect the database is the bottleneck. Which action should you take to reduce the error rate without over-provisioning?

Trap 1: Increase the max instances per revision in Cloud Run to handle more…

Increasing Cloud Run instances could increase load on the already stressed database, worsening the issue.

Trap 2: Reduce the min instances of Cloud Run to decrease load on the…

Reducing instances may cause cold starts and does not address the root cause of database overload.

Trap 3: Add a Cloud SQL read replica and route read queries to it

The endpoint causing 503 errors likely involves writes; read replicas won't reduce write load.

Study all Implementing service monitoring strategies common traps →

A
Implement connection pooling and retry logic with exponential backoff in the API service
This reduces the number of simultaneous connections to the database and handles transient failures gracefully.
B
Increase the max instances per revision in Cloud Run to handle more concurrent requests
Why wrong: Increasing Cloud Run instances could increase load on the already stressed database, worsening the issue.
C
Reduce the min instances of Cloud Run to decrease load on the database
Why wrong: Reducing instances may cause cold starts and does not address the root cause of database overload.
D
Add a Cloud SQL read replica and route read queries to it
Why wrong: The endpoint causing 503 errors likely involves writes; read replicas won't reduce write load.

Full breakdown with real-world context →

Question 16mediummultiple choice

Read the full Implementing service monitoring strategies explanation →

A company uses Cloud Run for a critical service and needs to set up alerting for 5xx errors. They want to receive a notification within 1 minute of the error rate exceeding 1% for any 1-minute window. Which alerting approach should they use?

Trap 1: Set up a log-based metric for 5xx responses and create an alert on…

Log-based metrics have ingestion latency, often exceeding 1 minute, and may not meet the sub-minute alert requirement.

Trap 2: Create a Cloud Logging sink to a Pub/Sub topic and trigger a Cloud…

This approach relies on Cloud Logging's export latency, which can be several minutes, and does not provide a rate-based condition.

Trap 3: Use Cloud Monitoring's log-based alerting to trigger on every 5xx…

Alerting on every log entry can cause alert fatigue and does not allow for a rate condition like exceeding 1%.

Study all Implementing service monitoring strategies common traps →

A
Set up a log-based metric for 5xx responses and create an alert on the metric.
Why wrong: Log-based metrics have ingestion latency, often exceeding 1 minute, and may not meet the sub-minute alert requirement.
B
Create a Cloud Logging sink to a Pub/Sub topic and trigger a Cloud Function that sends notifications.
Why wrong: This approach relies on Cloud Logging's export latency, which can be several minutes, and does not provide a rate-based condition.
C
Use Cloud Monitoring's log-based alerting to trigger on every 5xx log entry.
Why wrong: Alerting on every log entry can cause alert fatigue and does not allow for a rate condition like exceeding 1%.
D
Create a Cloud Monitoring alerting policy using the 'Request count' metric with a condition that compares the ratio of 5xx responses to total requests over a 1-minute window.
Cloud Monitoring supports metric evaluation every few seconds, and the ratio condition meets the requirement of alerting within 1 minute.

Full breakdown with real-world context →

Question 17easymulti select

Read the full Implementing service monitoring strategies explanation →

Which TWO are best practices for implementing service monitoring strategies in Google Cloud?

Trap 1: Rely solely on synthetic monitoring to measure user experience.

Synthetic monitoring does not capture real user behavior and should be combined with real user monitoring.

Trap 2: Use multiple monitoring tools to cover all aspects of the system.

Consolidating monitoring into a single tool reduces complexity and improves consistency.

Trap 3: Manually analyze logs and metrics to identify issues.

Manual analysis is inefficient; automation is key in cloud monitoring.

Study all Implementing service monitoring strategies common traps →

A
Monitor the four golden signals (latency, traffic, errors, saturation) for every service.
The four golden signals provide a high-level overview of service health.
B
Rely solely on synthetic monitoring to measure user experience.
Why wrong: Synthetic monitoring does not capture real user behavior and should be combined with real user monitoring.
C
Define Service Level Objectives (SLOs) and use them to drive alerting.
SLOs help focus on what matters and reduce alert fatigue.
D
Use multiple monitoring tools to cover all aspects of the system.
Why wrong: Consolidating monitoring into a single tool reduces complexity and improves consistency.
E
Manually analyze logs and metrics to identify issues.
Why wrong: Manual analysis is inefficient; automation is key in cloud monitoring.

Full breakdown with real-world context →

Question 18hardmultiple choice

Read the full Implementing service monitoring strategies explanation →

A team has set up the alerting policies shown in the exhibit. They receive an alert for High Memory but not for High CPU. What is the most likely reason?

Exhibit

Refer to the exhibit.

```
{
  "alertPolicies": [
    {
      "displayName": "High CPU Alert",
      "combiner": "OR",
      "conditions": [
        {
          "displayName": "CPU usage > 80%",
          "conditionThreshold": {
            "filter": "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" resource.type=\"gce_instance\"",
            "comparison": "COMPARISON_GT",
            "thresholdValue": 0.8,
            "duration": "300s",
            "trigger": {
              "count": 1
            }
          }
        }
      ]
    },
    {
      "displayName": "High Memory Alert",
      "conditions": [
        {
          "displayName": "Memory usage > 90%",
          "conditionThreshold": {
            "filter": "metric.type=\"agent.googleapis.com/memory/percent_used\" resource.type=\"gce_instance\"",
            "comparison": "COMPARISON_GT",
            "thresholdValue": 0.9,
            "duration": "60s",
            "trigger": {
              "count": 1
            }
          }
        }
      ]
    }
  ]
}
```

Trap 1: The CPU alert's duration of 300 seconds prevents it from firing…

A longer duration delays the alert, but if CPU is consistently high, it will fire eventually.

Trap 2: The memory alert has a higher threshold value, making it easier to…

Both thresholds are high; the threshold is not the issue.

Trap 3: The CPU metric is not available because the instance does not have…

The metric compute.googleapis.com/instance/cpu/utilization is available without the agent.

Study all Implementing service monitoring strategies common traps →

A
The Cloud Monitoring agent is not installed or not reporting on the instance, so the memory metric is missing.
The agent is required for agent.googleapis.com metrics.
B
The CPU alert's duration of 300 seconds prevents it from firing before the memory alert.
Why wrong: A longer duration delays the alert, but if CPU is consistently high, it will fire eventually.
C
The memory alert has a higher threshold value, making it easier to trigger.
Why wrong: Both thresholds are high; the threshold is not the issue.
D
The CPU metric is not available because the instance does not have the Cloud Monitoring agent installed.
Why wrong: The metric compute.googleapis.com/instance/cpu/utilization is available without the agent.

Full breakdown with real-world context →

Question 19mediumdrag order

Read the full Implementing service monitoring strategies explanation →

Order the steps to configure a VPC Network Peering between two projects.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 20mediummatching

Read the full Implementing service monitoring strategies explanation →

Match each Google Cloud tool to its function in incident management.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

End-to-end incident lifecycle tool

Third-party alerting and on-call scheduling

Asynchronous messaging for event-driven alerts

Serverless automation for incident response

Containerized event-driven applications

Continue with 20-question session →

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Implementing service monitoring strategies sessions

Start a Implementing service monitoring strategies only practice session

Every question in these sessions is drawn from the Implementing service monitoring strategies domain — nothing else.

10 questions 20 questions 30 questions 50 questions

Browse all Implementing service monitoring strategies questions →Mixed PCDOE session

Frequently asked questions

What does the PCDOE exam test about Implementing service monitoring strategies?: Implementing service monitoring strategies questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?: Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Implementing service monitoring strategies questions in a focused session?: Yes — the session launcher on this page draws every question from the Implementing service monitoring strategies domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PCDOE topics?: Use the topic links above to move to related areas, or go back to the PCDOE question bank to see all topics.
Are these real exam questions or dumps?: These are original practice questions written to test the same concepts the PCDOE exam covers. They are not copied from any real exam or dump site.

Implementing service monitoring strategies only

10 questions 20 questions 30 questions 50 questions

Mixed PCDOE session

Track your progress

A free account saves results across sessions and highlights which topics need work.

Study resources

All PCDOE questions Implementing service monitoring strategies domain overview PCDOE exam guide

Exam traps to avoid

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Implementing service monitoring strategies practice questions

What to know about Implementing service monitoring strategies

Common Implementing service monitoring strategies exam traps

Implementing service monitoring strategies questions

A team is monitoring a production service on Google Kubernetes Engine (GKE) and notices that a deployment is occasionally returning HTTP 503 errors. The team has set up a ServiceMonitor in Prometheus to scrape metrics from the pods. What is the most likely cause of the intermittent 503 errors?

A DevOps engineer is setting up alerting policies for a critical API service. They want to receive an alert if the error rate exceeds 5% for at least 5 minutes, but only during business hours (9 AM to 5 PM). Which approach should they use?

Which TWO of the following are best practices for implementing service monitoring in Google Cloud? (Choose 2)

Which THREE of the following are valid approaches to monitor a custom application metric in Cloud Monitoring? (Choose 3)

A DevOps engineer runs the command above and gets the output shown. What does this output indicate?

A team has deployed a Prometheus server on GKE using the configuration above. They expect Prometheus to scrape metrics from pods with the label 'app: my-app' and the annotation 'prometheus.io/scrape: true' on port 8080. However, no metrics are being collected. What is the most likely cause?

Exhibit

You are setting up alerting for a batch processing job that runs daily on Compute Engine. The job must complete within 2 hours. Which metric and alert condition should you use to ensure you are notified if the job is still running after 90 minutes?

Which TWO metrics should be included in a comprehensive monitoring strategy for a production Kubernetes workload to detect performance degradation and capacity issues?

A company uses Cloud Run for a critical service and needs to set up alerting for 5xx errors. They want to receive a notification within 1 minute of the error rate exceeding 1% for any 1-minute window. Which alerting approach should they use?

Which TWO are best practices for implementing service monitoring strategies in Google Cloud?

A team has set up the alerting policies shown in the exhibit. They receive an alert for High Memory but not for High CPU. What is the most likely reason?

Exhibit

Order the steps to configure a VPC Network Peering between two projects.

Match each Google Cloud tool to its function in incident management.

Track your progress over time

Start a Implementing service monitoring strategies only practice session

Related PCDOE topic practice pages

Bootstrapping a Google Cloud organization for DevOps practice questions

Managing service incidents practice questions

Managing Google Cloud costs practice questions

Building and implementing CI/CD pipelines practice questions