Free PCD Managing application performance monitoring Practice Questions (2026)

Q: How many Managing application performance monitoring questions are on the PCD exam?

The Managing application performance monitoring domain is one of the weighted domains on the PCD exam. The Courseiva question bank has 111 practice questions for this domain.

Q: How can I practice Managing application performance monitoring questions for PCD?

Click any of the 111 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Managing application performance monitoring domain.

Practice Managing application performance monitoring questions

10Q 20Q 30Q 50Q

All PCD Managing application performance monitoring questions (111)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A company deploys a microservices application on Google Kubernetes Engine (GKE). The operations team needs to monitor API latency between services. Which Google Cloud service should they use to trace requests across services?

A developer notices that a Cloud Function is timing out after 60 seconds. The function makes an external API call that occasionally takes longer than the timeout. What is the best practice to handle this?

A company uses Cloud Monitoring to set up an alerting policy for CPU utilization on Compute Engine instances. They want to be notified when average CPU usage exceeds 80% for 5 minutes. Which threshold type should they use?

An application running on GKE is experiencing high latency. The team uses Cloud Trace to identify the bottleneck. They notice that a particular service spends most of its time waiting on a database query. How can they optimize performance?

A company uses Cloud Run for a serverless application. They notice that cold starts are causing high latency for some requests. What is the best strategy to reduce cold starts?

A team wants to monitor custom application metrics from a Compute Engine instance. They use the Cloud Monitoring agent. Which metric type should they use to report a gauge measurement like current memory usage?

A company uses Cloud Monitoring to create an uptime check for their external HTTP endpoint. The check fails periodically even though the service is healthy. What is the most likely cause?

An application running on GKE uses a custom metric to track order processing time. The metric is exported via Prometheus and ingested by Cloud Monitoring using the Managed Service for Prometheus. The team wants to create an alert when the 95th percentile latency exceeds 2 seconds over a 5-minute window. Which PromQL query should be used?

A company uses Cloud Logging to centralize logs from multiple projects. They want to create a log-based metric for tracking 404 errors. However, the metric shows zero data even though 404 errors are occurring. What is the most likely reason?

Which TWO are best practices for setting up alerting policies in Cloud Monitoring? (Choose two.)

Which THREE are valid uses of Cloud Trace? (Choose three.)

Which TWO are correct ways to reduce logging costs in Google Cloud? (Choose two.)

Your company runs a multi-tier web application on Google Kubernetes Engine (GKE). The application consists of a frontend service, a backend API service, and a PostgreSQL database managed by Cloud SQL. Recently, users have been reporting intermittent slow response times during peak hours (10 AM - 12 PM). You have set up Cloud Monitoring dashboards and alerts. Cloud Trace shows that the backend API service has high latency, but only for certain requests. You notice that the backend service's CPU utilization is around 60% during peak hours, and memory usage is normal. The Cloud SQL instance's CPU utilization is at 90% and the query latency is high. You have also observed that the backend service makes multiple database queries per request, some of which are repeated. What is the most effective course of action to reduce latency?

Your team manages a serverless application deployed on Cloud Run. The application processes image uploads and stores metadata in Firestore. You have set up a Cloud Monitoring alert based on the 'request_count' metric for the Cloud Run service. The alert triggers when the request count exceeds 1000 requests per minute. Recently, the alert has been firing frequently, but the team notices that the application is performing well and there are no errors. The team is concerned about alert fatigue. You review the metric and notice that the request count metric is based on all HTTP requests, including health checks from the Cloud Run system. The health check requests account for about 30% of the total requests. What should you do to reduce unnecessary alerts while still monitoring real user traffic?

Your application running on Google Kubernetes Engine (GKE) is experiencing intermittent latency spikes. You have enabled Cloud Monitoring and Cloud Logging. Which approach would be MOST effective to identify the root cause?

A development team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance. They want to be notified when the instance's CPU utilization exceeds 80% for at least 5 minutes. Which alerting policy configuration should they use?

Based on the Cloud Trace exhibit, which service is the primary contributor to the overall request latency?

You are configuring a Cloud Monitoring alerting policy for a Cloud Run service. The service has a maximum of 10 concurrent requests per instance. You want to be alerted when the average number of concurrent requests per instance exceeds 8 for at least 1 minute. Which metric and condition type should you use?

You are designing a monitoring strategy for a microservices architecture running on GKE. Each service emits custom business metrics (e.g., order processing time). You want to create a dashboard that shows the 99th percentile latency for each service over the last 7 days. Which approach should you take?

Which TWO actions are best practices for managing application performance monitoring in Google Cloud?

Which THREE components are essential for a complete application performance monitoring (APM) solution on Google Cloud?

Your Cloud Run service is experiencing 5xx errors. You have enabled Cloud Logging and Cloud Error Reporting. How can you quickly identify the most common error type?

You are managing a microservices application deployed on Google Kubernetes Engine (GKE) that uses Cloud Monitoring and Cloud Logging. Recently, users have reported intermittent slow response times, especially during peak hours. You have enabled the Ops Agent on GKE nodes and configured custom metrics for your services. The application consists of a frontend service, a backend API service, and a database service. The frontend calls the backend, which in turn queries the database. You notice that when the response time spikes, the frontend service's CPU utilization remains low, but the backend service's CPU utilization increases. The database service shows normal latency and no errors. You have examined the logs and found no application errors. The GKE cluster has three node pools: one for each service, with autoscaling enabled. The backend service is configured with a HorizontalPodAutoscaler (HPA) based on CPU utilization, but the HPA does not seem to scale up quickly enough during traffic spikes. You want to identify the root cause of the performance degradation. Which course of action should you take first?

A company is running a microservices application on Google Kubernetes Engine (GKE). They have implemented Cloud Monitoring and Cloud Logging, but recently they noticed that the Istio-proxy sidecar logs are missing from Cloud Logging. The application pods are running correctly and the sidecar containers are present. What is the most likely cause of the missing logs?

A DevOps team wants to set up custom metrics for a serverless application running on Cloud Run. The application emits metrics using OpenTelemetry. They need to collect these metrics and create an alerting policy that triggers when the 99th percentile latency exceeds 500ms for 5 minutes. Which TWO actions must they take? (Choose two.)

Your company runs a production App Engine standard environment service (module 'frontend', version 'v2') that handles e-commerce checkout requests. You have set up an alerting policy on a custom metric 'request_latency' that fires when latency exceeds 500ms for 1 minute. Recently, customers have complained about slow checkout times, but no alert has fired. You examine the exhibit: the log entry shows a latency of 0.452s (452ms) for a request to '/api/checkout'. The custom metric is defined from OpenTelemetry instrumentation. What is the most likely reason the alert did not fire?

Drag and drop the steps to configure a Cloud Storage bucket with uniform bucket-level access in the correct order.

Drag and drop the steps to grant a service account access to a Cloud Storage bucket in the correct order.

Match each Cloud Logging and Monitoring concept to its definition.

Match each error code to its meaning in Google Cloud.

A team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance that runs a web server. The team wants to be notified if the instance's CPU utilization exceeds 80% for 5 minutes. Which threshold type should they use?

An application deployed on Google Kubernetes Engine is experiencing intermittent latency spikes. The team has enabled Cloud Trace and sees that a specific gRPC call to a backend service occasionally takes >500ms. However, the backend service's logs show no errors. What is the most likely cause that the team should investigate further?

A company runs a microservices architecture on Cloud Run. They want to measure the error budget for a critical service using a custom SLI based on the ratio of successful requests (HTTP 200-499) to total requests. They have set an SLO of 99.9% over a 30-day window. Which Cloud Monitoring feature should they use to track this?

A developer needs to view detailed performance profiles of a Java application running on Compute Engine to identify CPU hotspots. Which Google Cloud service should they use?

An operations team has set up a Cloud Monitoring alerting policy that fires when the 99th percentile latency of a service exceeds 200ms for 1 minute. They notice that the alert fires frequently during normal traffic patterns. What is the most likely issue with the alert configuration?

A company uses Cloud Monitoring with custom metrics. They have a custom metric called 'requests_total' with labels 'endpoint', 'status_code'. They want to create an alert that fires if the error rate (status_code >=500) for any endpoint exceeds 5% over a 5-minute window. Which MQL query should they use?

A team wants to monitor the availability of an external API by pinging it every minute from multiple locations around the world. Which Cloud Monitoring feature should they use?

A developer is using Cloud Logging and wants to export logs from a specific project to BigQuery for long-term analysis. They have created a log sink and given the appropriate permissions, but logs are not appearing in BigQuery. What is the most likely cause?

A company has a Cloud Run service that uses Cloud SQL. They notice that the number of database connections is increasing over time, causing connection pool exhaustion. They have enabled Cloud Monitoring and see a custom metric for active DB connections. To proactively alert when the connection count exceeds 80% of the maximum pool size (which is 100), which alerting approach is most efficient?

A DevOps team is migrating an on-premises monitoring solution to Google Cloud. They need to collect custom application metrics from a batch processing job running on Compute Engine. Which two services can ingest custom metrics into Cloud Monitoring? (Choose two.)

A company is using Cloud Monitoring to set up an SLO for a latency-sensitive API. They have defined a custom SLI: the proportion of requests with latency under 200ms. Which three components must they define to create a complete SLO configuration? (Choose three.)

A developer wants to view real-time logs from a running application on Compute Engine. Which two methods can they use to stream logs? (Choose two.)

Refer to the exhibit. A team is using Cloud Monitoring with MQL to alert on CPU utilization per zone. They notice that the alert fires even when no single instance in a zone has CPU>80%, because the average across instances in the zone exceeds 80%. What change should they make to the MQL query to alert only when any individual instance exceeds 80%?

Refer to the exhibit. A developer sees this log entry in Cloud Logging. The application is running on Compute Engine. Which tool should they use to further diagnose the cause of the connection refusal?

Refer to the exhibit. A team has created this alerting policy for a Cloud Run service. However, the alert never fires even though the error rate sometimes exceeds 1%. What is the most likely issue?

An application deployed on Google Kubernetes Engine (GKE) is experiencing intermittent high latency. The operations team wants to quickly identify which specific code path is causing the delay. What should they use?

A company runs a stateless application on Compute Engine behind a load balancer. They want to monitor the number of active requests per instance without adding custom instrumentation. What is the most straightforward approach?

An application writes structured logs to Cloud Logging. The team wants to create a metric based on the value of a JSON field 'order_total' to alert when totals exceed $1000. What type of metric should they use?

A team notices that a Cloud Run service occasionally returns HTTP 500 errors. They have enabled Cloud Error Reporting. What is the best way to rapidly diagnose the root cause of these errors?

A company runs a multi-service application on GKE and wants to create a Service Level Indicator (SLI) for request latency. They have set up Cloud Service Mesh (Anthos Service Mesh) with Istio. Which metric should they use for the SLI?

An operations team is configuring a Cloud Monitoring alerting policy for a critical application. They want to ensure that alerts are only fired when an anomaly persists for at least 5 minutes to reduce noise. Which condition configuration should they use?

A developer wants to automatically capture CPU and memory profiles from a production application running on Compute Engine to identify performance bottlenecks. Which Google Cloud tool should they use?

A team uses Cloud Endpoints to manage their API. They want to monitor API latency for each API method. What is the recommended approach?

A company receives a Cloud Monitoring alert that a Compute Engine instance's CPU utilization has exceeded 90% for the past 15 minutes. The incident turns out to be a false alarm caused by a scheduled job that runs daily. How can they prevent future false alarms for this recurring pattern?

Which TWO are best practices for setting up Cloud Monitoring alerting policies to minimize alert fatigue? (Select exactly 2.)

Which TWO capabilities does Cloud Service Mesh (Istio) provide to help monitor application performance? (Select exactly 2.)

Which THREE are valid ways to create custom metrics in Cloud Monitoring? (Select exactly 3.)

A team is investigating increased latency in a web application deployed on Google Kubernetes Engine (GKE). They want to identify which specific service calls are slow. Which Google Cloud tool should they use?

A developer wants to ensure that error logs from their Java application are automatically captured and grouped in Cloud Error Reporting. What is the recommended approach?

An organization has multiple Google Cloud projects for different environments (dev, staging, prod). They want to create a single Cloud Monitoring dashboard that shows metrics from all projects. What is the correct approach?

A team wants to monitor CPU utilization on their Compute Engine instances. They need an alert that sends a notification when the average CPU utilization across all instances in a project exceeds 80% for more than 5 minutes. Which alerting configuration should they use?

A company uses Cloud Logging to store application logs. They need to keep logs for 3 years for compliance. What is the most cost-effective way to store logs for this duration?

A team is using Cloud Trace to analyze performance of a microservices application. They notice that some spans are missing from the trace. What is the most likely cause?

You need to create an uptime check for an external HTTPS endpoint and configure an alert that sends a notification if the check fails for 3 consecutive attempts. Which configuration is correct?

A developer wants to view real-time latency metrics for their App Engine application. Where can they find this data?

A company is using Cloud Monitoring to track custom metrics published from an on-premises application using the Monitoring API. The metrics are published every 30 seconds. The team wants to create an alert that fires if the metric goes below a threshold for more than 1 minute. Which alert condition type should they use?

Which TWO of the following are valid ways to export Cloud Logging logs to BigQuery?

Which THREE metrics are commonly used to create a Service Level Indicator (SLI) for availability of an HTTP-based service?

Which TWO statements about Cloud Trace are correct?

Refer to the exhibit. You are reviewing a Cloud Monitoring MQL query. What is the purpose of this query?

Refer to the exhibit. You are analyzing application logs and notice that some logs contain a 'trace' field. What does this field enable?

Refer to the exhibit. The alert fires when what happens?

A company is deploying a microservices architecture on Google Kubernetes Engine (GKE). They need to monitor inter-service latency and error rates. Which set of Google Cloud services should they use to collect and visualize these metrics?

A Cloud Run service is experiencing intermittent high latency. The team has enabled Cloud Trace. They want to identify the root cause by analyzing traces. What should they look for in the Trace viewer?

An organization wants to create custom metrics based on application logs to track business KPIs. They need to ensure these metrics are available for alerting within minutes. Which approach should they use?

A developer wants to receive notifications when the error rate of their application exceeds 1% over a 5-minute window. What should they create in Cloud Monitoring?

A team notices that their application's latency has increased after a recent deployment. They suspect a specific code path is slower. Which Google Cloud tool should they use to identify the most time-consuming functions in their code?

An application running on Compute Engine generates structured logs. The operations team needs to parse a specific field from the logs and create a metric that counts occurrences of a particular value. They want the metric to be available for alerting with minimal delay. What should they do?

A company wants to monitor the CPU utilization of their Compute Engine instances and automatically trigger scaling actions if utilization exceeds 80% for 5 minutes. Which service should they use?

An application uses Cloud SQL and is experiencing slow query performance. The team wants to monitor query latency and identify slow queries. Which Google Cloud tool should they use?

A company has a multi-region deployment of their application on GKE. They need to monitor service-level indicators (SLIs) like availability and latency across regions. They want a single pane of glass to view SLO compliance. What should they use?

A developer wants to profile their application's CPU and memory usage to identify performance bottlenecks. Which TWO Google Cloud services should they use?

A team wants to monitor a web application's uptime from multiple locations. Which THREE Google Cloud monitoring features should they use?

A company's application on GKE is experiencing performance degradation. They want to use Google Cloud operations tools to identify the root cause. Which THREE tools should they use in combination?

The alert is not firing even though error_count metric occasionally spikes above 10. What is the most likely reason?

What conclusion can be drawn from these traces?

What is the first step to resolve this error?

A web application hosted on Compute Engine is experiencing slow response times during peak hours. Which Cloud Monitoring metric should be examined first to identify the bottleneck?

A developer deploying a new version of a microservice sees a sudden increase in error logs in Cloud Logging. The errors are 500 responses from the service. What is the most efficient way to investigate the root cause?

A company wants to create an SLO for their API with a target of 99.9% availability over a 30-day rolling window. They are using Cloud Monitoring. Which combination of resources and techniques should they use?

Your application is deployed on Google Kubernetes Engine (GKE). You want to monitor resource usage at the pod level. Which tool should you use?

You need to create a custom dashboard in Cloud Monitoring that shows the number of 500 errors from your application, along with the average latency. What is the correct way to create this?

A team uses Cloud Monitoring alerting policies with multiple conditions. They want to notify only when both CPU utilization is above 80% and error rate is above 5% for 5 minutes. Which type of condition should be used?

You want to identify performance bottlenecks in your application's code, such as functions consuming excessive CPU. Which Google Cloud tool should you use?

Your application writes structured logs to Cloud Logging. You want to create a metric that counts log entries with a specific severity level, then alert when the count exceeds a threshold. What should you do?

You need to set up a notification channel that sends alerts to a third-party incident management system using webhooks. What must be configured?

Which TWO are best practices for reducing the cost of Cloud Logging for a high-traffic application?

You are troubleshooting a performance issue in a microservices application. Which TWO tools from Google Cloud's operations suite would you use to trace a request across services and identify the slowest component?

Which THREE are valid methods to create custom metrics in Cloud Monitoring?

100

Your company runs a multi-tier application on Compute Engine with a Cloud SQL backend. Recently, during peak hours, users report slow page loads. Cloud Monitoring shows high CPU on the app servers, but no memory pressure. Cloud Trace shows that the application spends most of its time waiting for database queries. The Cloud SQL instance is a high-memory machine type with 16 vCPUs and 64 GB RAM, but CPU utilization on the database is only 30%. There are no slow query alerts. What is the most likely cause and what should you do?

101

Your team manages a service that receives thousands of requests per second. They have set up Cloud Monitoring alerting based on the 99th percentile latency. Recently, they received an alert warning that latency exceeded 1 second, but after investigating, they found it was a false alarm caused by a single very slow request. How can they improve their alert to reduce false positives?

102

You deployed a new version of your application that uses Cloud Pub/Sub for asynchronous messaging. After deployment, you notice that messages are accumulating in the subscription backlog. You suspect the subscriber is too slow. Which tool should you use to diagnose?

103

A company runs a microservices application on Google Kubernetes Engine (GKE). Users report intermittent slow responses. Developers suspect a specific service is causing latency. Which Google Cloud tool should they use to trace requests across services and identify the root cause?

104

A developer wants to automatically detect and capture application errors in a production environment on Google Cloud. Which two Google Cloud services should be enabled? (Choose two.)

105

A DevOps team is deploying a critical application on GKE. To ensure application performance monitoring and reliability, which three actions should they take? (Choose three.)

106

A startup has deployed a Python web application on Compute Engine. They have installed the Cloud Monitoring agent and can see basic system metrics like CPU and disk usage. However, they want to track custom application metrics, such as number of active users and request latency, to monitor performance. They have added OpenCensus code to export metrics but notice that custom metrics are not appearing in Cloud Monitoring. The application runs under a custom service account with the 'Monitoring Metric Writer' role assigned. What is the most likely cause?

107

A company runs a Java microservice on GKE that processes financial transactions. The service is critical and must meet a 99.9% availability SLO. They have set up Cloud Monitoring alerting policies based on request latency and error rate. Recently, the team noticed that the alerting policy for high latency fires too frequently with false positives, causing alert fatigue. They want to reduce false positives without compromising real issues. The latency metric is collected from the application's custom metric via Prometheus. Which approach should they take?

108

A development team is using Cloud Trace to analyze performance bottlenecks in a Node.js application deployed on GKE. They have enabled trace sampling at 10% and can see some traces, but many requests are not captured. They want to increase the sampling rate to 100% for a specific high-traffic endpoint while keeping the default sampling rate for other endpoints. How can they achieve this?

109

A company has a legacy monolithic application running on Compute Engine that is being migrated to microservices on GKE. During the migration, they need to maintain performance monitoring across both environments. The legacy application uses Stackdriver Logging and Monitoring agents (now Ops Agent) and exports logs to Cloud Logging. The new microservices are instrumented with OpenTelemetry for traces and metrics. The team wants a unified view of performance across both environments, including distributed traces from the new services and log-based metrics from the legacy app. They also want to correlate logs and traces for troubleshooting. Which solution should they implement?

110

A team is developing a mobile backend API on Google Cloud. They are using Cloud Endpoints to manage API authentication and quotas. They want to monitor API performance including request count, latency, and error rates. They have enabled Cloud Endpoints logging but are not seeing detailed performance metrics in Cloud Monitoring. What should they do?

111

You are a site reliability engineer for a fintech company that runs a latency-sensitive trading application on Google Kubernetes Engine (GKE). The application is instrumented with OpenTelemetry and exports traces and metrics to Cloud Monitoring and Cloud Logging. Recently, the team observed a gradual increase in p99 latency from 50ms to 500ms over the past week, and error rates have spiked to 5% from a baseline of 0.1%. You review the Cloud Monitoring dashboards and notice that the 'container/cpu/utilization' metric shows normal usage, but the 'container/memory/bytes_used' metric shows a steady climb, reaching 90% of the memory limit on several pods. The application logs contain many 'OutOfMemoryError' exceptions and 'GC overhead limit exceeded' messages. You also see that the HPA (Horizontal Pod Autoscaler) has not triggered any scale-up events because the 'custom/googleapis.com|container/cpu/utilization' metric is below the target utilization threshold. The cluster autoscaler is enabled and has sufficient node pool capacity. What is the most likely root cause and the best immediate action to resolve the issue?

Practice all 111 Managing application performance monitoring questions

Other PCD exam domains

Designing highly scalable, available, and reliable cloud-native applications Building and testing applications Deploying applications Integrating Google Cloud services

Frequently asked questions

What does the Managing application performance monitoring domain cover on the PCD exam?

The Managing application performance monitoring domain covers the key concepts tested in this area of the PCD exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PCD domains — no account required.

How many Managing application performance monitoring questions are in the PCD question bank?

The Courseiva PCD question bank contains 111 questions in the Managing application performance monitoring domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Managing application performance monitoring for PCD?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Managing application performance monitoring questions for PCD?

Yes — the session launcher on this page draws questions exclusively from the Managing application performance monitoring domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your PCD domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included