Practice PCD Managing application performance monitoring questions with full explanations on every answer.
Start practicing
Managing application performance monitoring — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A company deploys a microservices application on Google Kubernetes Engine (GKE). The operations team needs to monitor API latency between services. Which Google Cloud service should they use to trace requests across services?
2A developer notices that a Cloud Function is timing out after 60 seconds. The function makes an external API call that occasionally takes longer than the timeout. What is the best practice to handle this?
3A company uses Cloud Monitoring to set up an alerting policy for CPU utilization on Compute Engine instances. They want to be notified when average CPU usage exceeds 80% for 5 minutes. Which threshold type should they use?
4An application running on GKE is experiencing high latency. The team uses Cloud Trace to identify the bottleneck. They notice that a particular service spends most of its time waiting on a database query. How can they optimize performance?
5A company uses Cloud Run for a serverless application. They notice that cold starts are causing high latency for some requests. What is the best strategy to reduce cold starts?
6A team wants to monitor custom application metrics from a Compute Engine instance. They use the Cloud Monitoring agent. Which metric type should they use to report a gauge measurement like current memory usage?
7A company uses Cloud Monitoring to create an uptime check for their external HTTP endpoint. The check fails periodically even though the service is healthy. What is the most likely cause?
8An application running on GKE uses a custom metric to track order processing time. The metric is exported via Prometheus and ingested by Cloud Monitoring using the Managed Service for Prometheus. The team wants to create an alert when the 95th percentile latency exceeds 2 seconds over a 5-minute window. Which PromQL query should be used?
9A company uses Cloud Logging to centralize logs from multiple projects. They want to create a log-based metric for tracking 404 errors. However, the metric shows zero data even though 404 errors are occurring. What is the most likely reason?
10Which TWO are best practices for setting up alerting policies in Cloud Monitoring? (Choose two.)
11Which THREE are valid uses of Cloud Trace? (Choose three.)
12Which TWO are correct ways to reduce logging costs in Google Cloud? (Choose two.)
13Your company runs a multi-tier web application on Google Kubernetes Engine (GKE). The application consists of a frontend service, a backend API service, and a PostgreSQL database managed by Cloud SQL. Recently, users have been reporting intermittent slow response times during peak hours (10 AM - 12 PM). You have set up Cloud Monitoring dashboards and alerts. Cloud Trace shows that the backend API service has high latency, but only for certain requests. You notice that the backend service's CPU utilization is around 60% during peak hours, and memory usage is normal. The Cloud SQL instance's CPU utilization is at 90% and the query latency is high. You have also observed that the backend service makes multiple database queries per request, some of which are repeated. What is the most effective course of action to reduce latency?
14Your team manages a serverless application deployed on Cloud Run. The application processes image uploads and stores metadata in Firestore. You have set up a Cloud Monitoring alert based on the 'request_count' metric for the Cloud Run service. The alert triggers when the request count exceeds 1000 requests per minute. Recently, the alert has been firing frequently, but the team notices that the application is performing well and there are no errors. The team is concerned about alert fatigue. You review the metric and notice that the request count metric is based on all HTTP requests, including health checks from the Cloud Run system. The health check requests account for about 30% of the total requests. What should you do to reduce unnecessary alerts while still monitoring real user traffic?
15Your application running on Google Kubernetes Engine (GKE) is experiencing intermittent latency spikes. You have enabled Cloud Monitoring and Cloud Logging. Which approach would be MOST effective to identify the root cause?
16A development team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance. They want to be notified when the instance's CPU utilization exceeds 80% for at least 5 minutes. Which alerting policy configuration should they use?
17Based on the Cloud Trace exhibit, which service is the primary contributor to the overall request latency?
18You are configuring a Cloud Monitoring alerting policy for a Cloud Run service. The service has a maximum of 10 concurrent requests per instance. You want to be alerted when the average number of concurrent requests per instance exceeds 8 for at least 1 minute. Which metric and condition type should you use?
19You are designing a monitoring strategy for a microservices architecture running on GKE. Each service emits custom business metrics (e.g., order processing time). You want to create a dashboard that shows the 99th percentile latency for each service over the last 7 days. Which approach should you take?
20Which TWO actions are best practices for managing application performance monitoring in Google Cloud?
21Which THREE components are essential for a complete application performance monitoring (APM) solution on Google Cloud?
22Your Cloud Run service is experiencing 5xx errors. You have enabled Cloud Logging and Cloud Error Reporting. How can you quickly identify the most common error type?
23You are managing a microservices application deployed on Google Kubernetes Engine (GKE) that uses Cloud Monitoring and Cloud Logging. Recently, users have reported intermittent slow response times, especially during peak hours. You have enabled the Ops Agent on GKE nodes and configured custom metrics for your services. The application consists of a frontend service, a backend API service, and a database service. The frontend calls the backend, which in turn queries the database. You notice that when the response time spikes, the frontend service's CPU utilization remains low, but the backend service's CPU utilization increases. The database service shows normal latency and no errors. You have examined the logs and found no application errors. The GKE cluster has three node pools: one for each service, with autoscaling enabled. The backend service is configured with a HorizontalPodAutoscaler (HPA) based on CPU utilization, but the HPA does not seem to scale up quickly enough during traffic spikes. You want to identify the root cause of the performance degradation. Which course of action should you take first?
24A company is running a microservices application on Google Kubernetes Engine (GKE). They have implemented Cloud Monitoring and Cloud Logging, but recently they noticed that the Istio-proxy sidecar logs are missing from Cloud Logging. The application pods are running correctly and the sidecar containers are present. What is the most likely cause of the missing logs?
25A DevOps team wants to set up custom metrics for a serverless application running on Cloud Run. The application emits metrics using OpenTelemetry. They need to collect these metrics and create an alerting policy that triggers when the 99th percentile latency exceeds 500ms for 5 minutes. Which TWO actions must they take? (Choose two.)
26Your company runs a production App Engine standard environment service (module 'frontend', version 'v2') that handles e-commerce checkout requests. You have set up an alerting policy on a custom metric 'request_latency' that fires when latency exceeds 500ms for 1 minute. Recently, customers have complained about slow checkout times, but no alert has fired. You examine the exhibit: the log entry shows a latency of 0.452s (452ms) for a request to '/api/checkout'. The custom metric is defined from OpenTelemetry instrumentation. What is the most likely reason the alert did not fire?
27Drag and drop the steps to configure a Cloud Storage bucket with uniform bucket-level access in the correct order.
28Drag and drop the steps to grant a service account access to a Cloud Storage bucket in the correct order.
29Match each Cloud Logging and Monitoring concept to its definition.
30Match each error code to its meaning in Google Cloud.
31A team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance that runs a web server. The team wants to be notified if the instance's CPU utilization exceeds 80% for 5 minutes. Which threshold type should they use?
32An application deployed on Google Kubernetes Engine is experiencing intermittent latency spikes. The team has enabled Cloud Trace and sees that a specific gRPC call to a backend service occasionally takes >500ms. However, the backend service's logs show no errors. What is the most likely cause that the team should investigate further?
33A company runs a microservices architecture on Cloud Run. They want to measure the error budget for a critical service using a custom SLI based on the ratio of successful requests (HTTP 200-499) to total requests. They have set an SLO of 99.9% over a 30-day window. Which Cloud Monitoring feature should they use to track this?
34A developer needs to view detailed performance profiles of a Java application running on Compute Engine to identify CPU hotspots. Which Google Cloud service should they use?
35An operations team has set up a Cloud Monitoring alerting policy that fires when the 99th percentile latency of a service exceeds 200ms for 1 minute. They notice that the alert fires frequently during normal traffic patterns. What is the most likely issue with the alert configuration?
36A company uses Cloud Monitoring with custom metrics. They have a custom metric called 'requests_total' with labels 'endpoint', 'status_code'. They want to create an alert that fires if the error rate (status_code >=500) for any endpoint exceeds 5% over a 5-minute window. Which MQL query should they use?
37A team wants to monitor the availability of an external API by pinging it every minute from multiple locations around the world. Which Cloud Monitoring feature should they use?
38A developer is using Cloud Logging and wants to export logs from a specific project to BigQuery for long-term analysis. They have created a log sink and given the appropriate permissions, but logs are not appearing in BigQuery. What is the most likely cause?
39A company has a Cloud Run service that uses Cloud SQL. They notice that the number of database connections is increasing over time, causing connection pool exhaustion. They have enabled Cloud Monitoring and see a custom metric for active DB connections. To proactively alert when the connection count exceeds 80% of the maximum pool size (which is 100), which alerting approach is most efficient?
40A DevOps team is migrating an on-premises monitoring solution to Google Cloud. They need to collect custom application metrics from a batch processing job running on Compute Engine. Which two services can ingest custom metrics into Cloud Monitoring? (Choose two.)
41A company is using Cloud Monitoring to set up an SLO for a latency-sensitive API. They have defined a custom SLI: the proportion of requests with latency under 200ms. Which three components must they define to create a complete SLO configuration? (Choose three.)
42A developer wants to view real-time logs from a running application on Compute Engine. Which two methods can they use to stream logs? (Choose two.)
43Refer to the exhibit. A team is using Cloud Monitoring with MQL to alert on CPU utilization per zone. They notice that the alert fires even when no single instance in a zone has CPU>80%, because the average across instances in the zone exceeds 80%. What change should they make to the MQL query to alert only when any individual instance exceeds 80%?
44Refer to the exhibit. A developer sees this log entry in Cloud Logging. The application is running on Compute Engine. Which tool should they use to further diagnose the cause of the connection refusal?
45Refer to the exhibit. A team has created this alerting policy for a Cloud Run service. However, the alert never fires even though the error rate sometimes exceeds 1%. What is the most likely issue?
46An application deployed on Google Kubernetes Engine (GKE) is experiencing intermittent high latency. The operations team wants to quickly identify which specific code path is causing the delay. What should they use?
47A company runs a stateless application on Compute Engine behind a load balancer. They want to monitor the number of active requests per instance without adding custom instrumentation. What is the most straightforward approach?
48An application writes structured logs to Cloud Logging. The team wants to create a metric based on the value of a JSON field 'order_total' to alert when totals exceed $1000. What type of metric should they use?
49A team notices that a Cloud Run service occasionally returns HTTP 500 errors. They have enabled Cloud Error Reporting. What is the best way to rapidly diagnose the root cause of these errors?
50A company runs a multi-service application on GKE and wants to create a Service Level Indicator (SLI) for request latency. They have set up Cloud Service Mesh (Anthos Service Mesh) with Istio. Which metric should they use for the SLI?
51An operations team is configuring a Cloud Monitoring alerting policy for a critical application. They want to ensure that alerts are only fired when an anomaly persists for at least 5 minutes to reduce noise. Which condition configuration should they use?
52A developer wants to automatically capture CPU and memory profiles from a production application running on Compute Engine to identify performance bottlenecks. Which Google Cloud tool should they use?
53A team uses Cloud Endpoints to manage their API. They want to monitor API latency for each API method. What is the recommended approach?
54A company receives a Cloud Monitoring alert that a Compute Engine instance's CPU utilization has exceeded 90% for the past 15 minutes. The incident turns out to be a false alarm caused by a scheduled job that runs daily. How can they prevent future false alarms for this recurring pattern?
55Which TWO are best practices for setting up Cloud Monitoring alerting policies to minimize alert fatigue? (Select exactly 2.)
56Which TWO capabilities does Cloud Service Mesh (Istio) provide to help monitor application performance? (Select exactly 2.)
57Which THREE are valid ways to create custom metrics in Cloud Monitoring? (Select exactly 3.)
58A team is investigating increased latency in a web application deployed on Google Kubernetes Engine (GKE). They want to identify which specific service calls are slow. Which Google Cloud tool should they use?
59A developer wants to ensure that error logs from their Java application are automatically captured and grouped in Cloud Error Reporting. What is the recommended approach?
60An organization has multiple Google Cloud projects for different environments (dev, staging, prod). They want to create a single Cloud Monitoring dashboard that shows metrics from all projects. What is the correct approach?
61A team wants to monitor CPU utilization on their Compute Engine instances. They need an alert that sends a notification when the average CPU utilization across all instances in a project exceeds 80% for more than 5 minutes. Which alerting configuration should they use?
62A company uses Cloud Logging to store application logs. They need to keep logs for 3 years for compliance. What is the most cost-effective way to store logs for this duration?
63A team is using Cloud Trace to analyze performance of a microservices application. They notice that some spans are missing from the trace. What is the most likely cause?
64You need to create an uptime check for an external HTTPS endpoint and configure an alert that sends a notification if the check fails for 3 consecutive attempts. Which configuration is correct?
65A developer wants to view real-time latency metrics for their App Engine application. Where can they find this data?
66A company is using Cloud Monitoring to track custom metrics published from an on-premises application using the Monitoring API. The metrics are published every 30 seconds. The team wants to create an alert that fires if the metric goes below a threshold for more than 1 minute. Which alert condition type should they use?
67Which TWO of the following are valid ways to export Cloud Logging logs to BigQuery?
68Which THREE metrics are commonly used to create a Service Level Indicator (SLI) for availability of an HTTP-based service?
69Which TWO statements about Cloud Trace are correct?
70Refer to the exhibit. You are reviewing a Cloud Monitoring MQL query. What is the purpose of this query?
71Refer to the exhibit. You are analyzing application logs and notice that some logs contain a 'trace' field. What does this field enable?
72Refer to the exhibit. The alert fires when what happens?
73A company is deploying a microservices architecture on Google Kubernetes Engine (GKE). They need to monitor inter-service latency and error rates. Which set of Google Cloud services should they use to collect and visualize these metrics?
74A Cloud Run service is experiencing intermittent high latency. The team has enabled Cloud Trace. They want to identify the root cause by analyzing traces. What should they look for in the Trace viewer?
75An organization wants to create custom metrics based on application logs to track business KPIs. They need to ensure these metrics are available for alerting within minutes. Which approach should they use?
76A developer wants to receive notifications when the error rate of their application exceeds 1% over a 5-minute window. What should they create in Cloud Monitoring?
77A team notices that their application's latency has increased after a recent deployment. They suspect a specific code path is slower. Which Google Cloud tool should they use to identify the most time-consuming functions in their code?
78An application running on Compute Engine generates structured logs. The operations team needs to parse a specific field from the logs and create a metric that counts occurrences of a particular value. They want the metric to be available for alerting with minimal delay. What should they do?
79A company wants to monitor the CPU utilization of their Compute Engine instances and automatically trigger scaling actions if utilization exceeds 80% for 5 minutes. Which service should they use?
80An application uses Cloud SQL and is experiencing slow query performance. The team wants to monitor query latency and identify slow queries. Which Google Cloud tool should they use?
81A company has a multi-region deployment of their application on GKE. They need to monitor service-level indicators (SLIs) like availability and latency across regions. They want a single pane of glass to view SLO compliance. What should they use?
82A developer wants to profile their application's CPU and memory usage to identify performance bottlenecks. Which TWO Google Cloud services should they use?
83A team wants to monitor a web application's uptime from multiple locations. Which THREE Google Cloud monitoring features should they use?
84A company's application on GKE is experiencing performance degradation. They want to use Google Cloud operations tools to identify the root cause. Which THREE tools should they use in combination?
85The alert is not firing even though error_count metric occasionally spikes above 10. What is the most likely reason?
86What conclusion can be drawn from these traces?
87What is the first step to resolve this error?
88A web application hosted on Compute Engine is experiencing slow response times during peak hours. Which Cloud Monitoring metric should be examined first to identify the bottleneck?
89A developer deploying a new version of a microservice sees a sudden increase in error logs in Cloud Logging. The errors are 500 responses from the service. What is the most efficient way to investigate the root cause?
90A company wants to create an SLO for their API with a target of 99.9% availability over a 30-day rolling window. They are using Cloud Monitoring. Which combination of resources and techniques should they use?
91Your application is deployed on Google Kubernetes Engine (GKE). You want to monitor resource usage at the pod level. Which tool should you use?
92You need to create a custom dashboard in Cloud Monitoring that shows the number of 500 errors from your application, along with the average latency. What is the correct way to create this?
93A team uses Cloud Monitoring alerting policies with multiple conditions. They want to notify only when both CPU utilization is above 80% and error rate is above 5% for 5 minutes. Which type of condition should be used?
94You want to identify performance bottlenecks in your application's code, such as functions consuming excessive CPU. Which Google Cloud tool should you use?
95Your application writes structured logs to Cloud Logging. You want to create a metric that counts log entries with a specific severity level, then alert when the count exceeds a threshold. What should you do?
96You need to set up a notification channel that sends alerts to a third-party incident management system using webhooks. What must be configured?
97Which TWO are best practices for reducing the cost of Cloud Logging for a high-traffic application?
98You are troubleshooting a performance issue in a microservices application. Which TWO tools from Google Cloud's operations suite would you use to trace a request across services and identify the slowest component?
99Which THREE are valid methods to create custom metrics in Cloud Monitoring?
100Your company runs a multi-tier application on Compute Engine with a Cloud SQL backend. Recently, during peak hours, users report slow page loads. Cloud Monitoring shows high CPU on the app servers, but no memory pressure. Cloud Trace shows that the application spends most of its time waiting for database queries. The Cloud SQL instance is a high-memory machine type with 16 vCPUs and 64 GB RAM, but CPU utilization on the database is only 30%. There are no slow query alerts. What is the most likely cause and what should you do?
101Your team manages a service that receives thousands of requests per second. They have set up Cloud Monitoring alerting based on the 99th percentile latency. Recently, they received an alert warning that latency exceeded 1 second, but after investigating, they found it was a false alarm caused by a single very slow request. How can they improve their alert to reduce false positives?
102You deployed a new version of your application that uses Cloud Pub/Sub for asynchronous messaging. After deployment, you notice that messages are accumulating in the subscription backlog. You suspect the subscriber is too slow. Which tool should you use to diagnose?
103A company runs a microservices application on Google Kubernetes Engine (GKE). Users report intermittent slow responses. Developers suspect a specific service is causing latency. Which Google Cloud tool should they use to trace requests across services and identify the root cause?
104A developer wants to automatically detect and capture application errors in a production environment on Google Cloud. Which two Google Cloud services should be enabled? (Choose two.)
105A DevOps team is deploying a critical application on GKE. To ensure application performance monitoring and reliability, which three actions should they take? (Choose three.)
106A startup has deployed a Python web application on Compute Engine. They have installed the Cloud Monitoring agent and can see basic system metrics like CPU and disk usage. However, they want to track custom application metrics, such as number of active users and request latency, to monitor performance. They have added OpenCensus code to export metrics but notice that custom metrics are not appearing in Cloud Monitoring. The application runs under a custom service account with the 'Monitoring Metric Writer' role assigned. What is the most likely cause?
107A company runs a Java microservice on GKE that processes financial transactions. The service is critical and must meet a 99.9% availability SLO. They have set up Cloud Monitoring alerting policies based on request latency and error rate. Recently, the team noticed that the alerting policy for high latency fires too frequently with false positives, causing alert fatigue. They want to reduce false positives without compromising real issues. The latency metric is collected from the application's custom metric via Prometheus. Which approach should they take?
108A development team is using Cloud Trace to analyze performance bottlenecks in a Node.js application deployed on GKE. They have enabled trace sampling at 10% and can see some traces, but many requests are not captured. They want to increase the sampling rate to 100% for a specific high-traffic endpoint while keeping the default sampling rate for other endpoints. How can they achieve this?
109A company has a legacy monolithic application running on Compute Engine that is being migrated to microservices on GKE. During the migration, they need to maintain performance monitoring across both environments. The legacy application uses Stackdriver Logging and Monitoring agents (now Ops Agent) and exports logs to Cloud Logging. The new microservices are instrumented with OpenTelemetry for traces and metrics. The team wants a unified view of performance across both environments, including distributed traces from the new services and log-based metrics from the legacy app. They also want to correlate logs and traces for troubleshooting. Which solution should they implement?
110A team is developing a mobile backend API on Google Cloud. They are using Cloud Endpoints to manage API authentication and quotas. They want to monitor API performance including request count, latency, and error rates. They have enabled Cloud Endpoints logging but are not seeing detailed performance metrics in Cloud Monitoring. What should they do?
111You are a site reliability engineer for a fintech company that runs a latency-sensitive trading application on Google Kubernetes Engine (GKE). The application is instrumented with OpenTelemetry and exports traces and metrics to Cloud Monitoring and Cloud Logging. Recently, the team observed a gradual increase in p99 latency from 50ms to 500ms over the past week, and error rates have spiked to 5% from a baseline of 0.1%. You review the Cloud Monitoring dashboards and notice that the 'container/cpu/utilization' metric shows normal usage, but the 'container/memory/bytes_used' metric shows a steady climb, reaching 90% of the memory limit on several pods. The application logs contain many 'OutOfMemoryError' exceptions and 'GC overhead limit exceeded' messages. You also see that the HPA (Horizontal Pod Autoscaler) has not triggered any scale-up events because the 'custom/googleapis.com|container/cpu/utilization' metric is below the target utilization threshold. The cluster autoscaler is enabled and has sufficient node pool capacity. What is the most likely root cause and the best immediate action to resolve the issue?
The Managing application performance monitoring domain covers the key concepts tested in this area of the PCD exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PCD domains — no account required.
The Courseiva PCD question bank contains 111 questions in the Managing application performance monitoring domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Managing application performance monitoring domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included