Knowledge + Practice

CCNA Managing application performance monitoring Questions

75 of 111 questions · Page 1/2 · Managing application performance monitoring · Answers revealed

Practice these questions Domain overview All questions

1

MCQhard

Based on the Cloud Trace exhibit, which service is the primary contributor to the overall request latency?

A.The productcatalog service

B.The frontend service itself

C.The auth service

D.The recommendations service

AnswerC

Auth service has the longest child span duration (800ms).

Why this answer

The Cloud Trace exhibit shows that the auth service accounts for the largest segment of the overall request latency, as indicated by the longest span duration in the trace waterfall. In Google Cloud Trace, each span represents a service's contribution to the total latency, and the service with the highest cumulative span time is the primary contributor. Since the auth service span is the longest, it is the correct answer.

Exam trap

Google Cloud often tests the misconception that the frontend service (the entry point) is the primary latency contributor, but the trace waterfall clearly shows that downstream service spans, not the root span, account for the majority of the latency.

How to eliminate wrong answers

Option A is wrong because the productcatalog service span shows a shorter duration than the auth service span, indicating it contributes less to the overall latency. Option B is wrong because the frontend service itself is the entry point and its own span duration is minimal compared to the downstream auth service call; the frontend's latency is dominated by waiting for the auth service response. Option D is wrong because the recommendations service span is either absent or has a negligible duration in the trace, meaning it is not a significant contributor to the total request latency.

Practice this question →

2

MCQeasy

A company deploys a microservices application on Google Kubernetes Engine (GKE). The operations team needs to monitor API latency between services. Which Google Cloud service should they use to trace requests across services?

A.Error Reporting

B.Cloud Logging

C.Cloud Monitoring

D.Cloud Trace

AnswerD

Cloud Trace provides distributed tracing to analyze latency across services.

Why this answer

Cloud Trace is the correct choice because it is a distributed tracing system designed to capture latency data as requests propagate through microservices. It provides end-to-end visibility by collecting trace spans from each service, allowing the operations team to identify bottlenecks and measure API latency between services in a GKE environment.

Exam trap

The trap here is that candidates confuse Cloud Monitoring (metrics) with Cloud Trace (distributed tracing), assuming that latency metrics alone can trace requests across services, but metrics lack the span-level context needed to follow a single request's path.

How to eliminate wrong answers

Option A is wrong because Error Reporting aggregates and analyzes application errors, not latency traces. Option B is wrong because Cloud Logging stores and queries log data, but it does not provide the distributed trace context needed to follow a request across multiple services. Option C is wrong because Cloud Monitoring focuses on metrics, alerts, and dashboards (e.g., CPU, memory), not on tracing individual request paths or measuring per-hop latency.

Practice this question →

3

Multi-Selecthard

Which TWO statements about Cloud Trace are correct?

Select 2 answers

A.Trace can be integrated with Cloud Monitoring for alerting

B.Trace collects latency data from all requests by default

C.Trace automatically creates dashboards for visualization

D.Trace can be used to analyze end-to-end latency across services

E.Trace supports auto-scaling based on latency

AnswersA, D

Trace data can be used with Cloud Monitoring alerts.

Why this answer

Option A is correct because Cloud Trace can be integrated with Cloud Monitoring to create alerting policies based on trace data, such as latency thresholds or error rates. This integration allows you to set up notifications when specific trace conditions are met, enabling proactive performance monitoring.

Exam trap

Cisco often tests the misconception that Cloud Trace captures all requests by default, but the key trap is that it uses sampling to manage cost and performance, so you must explicitly configure higher sampling for full visibility.

Practice this question →

4

Multi-Selecteasy

A developer wants to view real-time logs from a running application on Compute Engine. Which two methods can they use to stream logs? (Choose two.)

Select 2 answers

A.Using the Logs Explorer's 'Stream logs' feature

B.Using gcloud compute ssh and running journalctl -f

C.Using gcloud logging tail

D.Using Cloud Monitoring's metrics explorer

E.Using gcloud app logs tail

AnswersA, C

Correct: the Logs Explorer provides a streaming view.

Why this answer

Option A is correct because the Logs Explorer in the Google Cloud Console provides a 'Stream logs' feature that allows you to view real-time log entries as they are ingested by Cloud Logging. This is ideal for monitoring a running Compute Engine instance without needing to SSH into it. Option C is correct because the `gcloud logging tail` command streams log entries from Cloud Logging in real time, using the Logging API's tail method, and can filter by resource type (e.g., `gce_instance`) or log name.

Exam trap

Cisco often tests the distinction between streaming logs from the centralized Cloud Logging service versus streaming logs directly from the VM's local journal, and candidates mistakenly choose `journalctl -f` (Option B) because they think it provides the same real-time view, but it does not integrate with Cloud Logging's centralized filtering and retention.

Practice this question →

5

Multi-Selectmedium

A DevOps team is migrating an on-premises monitoring solution to Google Cloud. They need to collect custom application metrics from a batch processing job running on Compute Engine. Which two services can ingest custom metrics into Cloud Monitoring? (Choose two.)

Select 2 answers

A.Cloud Profiler API

B.Stackdriver Monitoring agent with custom plugin

C.Cloud Logging with log-based metrics

D.Cloud Trace API

E.Cloud Monitoring API with custom metric descriptors

AnswersC, E

Correct: log-based metrics can extract numerical values from logs and create custom metrics.

Why this answer

Option C is correct because Cloud Logging can ingest any log entry, and log-based metrics allow you to extract numeric values from log content to create custom metrics that appear in Cloud Monitoring. Option E is correct because the Cloud Monitoring API lets you define custom metric descriptors and then write time-series data directly to those metrics, bypassing any agent or log pipeline.

Exam trap

Cisco often tests the misconception that the Stackdriver Monitoring agent (Ops Agent) can ingest arbitrary custom metrics via plugins, when in fact it only collects predefined metrics and custom metrics require either log-based metrics or direct API calls.

Practice this question →

6

MCQeasy

Your Cloud Run service is experiencing 5xx errors. You have enabled Cloud Logging and Cloud Error Reporting. How can you quickly identify the most common error type?

A.Use Cloud Trace to analyze the traces of failing requests.

B.Open Cloud Error Reporting to see grouped error counts.

C.View the logs in Cloud Logging and manually count error messages.

D.Create a Cloud Monitoring alert on 5xx response codes.

AnswerB

Error Reporting aggregates and surfaces top errors.

Why this answer

Cloud Error Reporting automatically groups similar errors (e.g., same stack trace or error message) and shows a count for each group, making it the fastest way to identify the most common 5xx error type without manual log inspection. It is purpose-built for this exact use case, aggregating errors from Cloud Logging and presenting them in a dashboard sorted by frequency.

Exam trap

Cisco often tests the distinction between monitoring (Cloud Monitoring alerts) and error analysis (Cloud Error Reporting), tempting candidates to choose a monitoring alert when the question explicitly asks for identifying the most common error type, not just detecting that errors exist.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for latency analysis and distributed tracing, not for counting or grouping error types; it would require manual correlation to find the most common error. Option C is wrong because manually counting error messages in Cloud Logging is inefficient and error-prone, defeating the purpose of 'quickly' identifying the most common error type. Option D is wrong because a Cloud Monitoring alert on 5xx response codes only notifies you that errors are occurring, but does not group or identify the most common error type; it lacks the error aggregation and classification that Error Reporting provides.

Practice this question →

7

Multi-Selectmedium

Which THREE are valid uses of Cloud Trace? (Choose three.)

Select 3 answers

A.Identifying latency bottlenecks in a distributed application

B.Monitoring CPU usage of a Compute Engine instance

C.Viewing the flow of requests through microservices

D.Analyzing the performance of external API calls

E.Exporting traces to Prometheus for long-term storage

AnswersA, C, D

Trace shows where time is spent across services.

Why this answer

Cloud Trace is a distributed tracing system that captures latency data from applications, allowing you to identify performance bottlenecks across services. Option A is correct because Cloud Trace provides detailed traces that show the time spent in each component of a distributed application, enabling you to pinpoint where delays occur.

Exam trap

Cisco often tests the distinction between tracing (Cloud Trace) and monitoring (Cloud Monitoring), so candidates mistakenly choose CPU usage monitoring as a valid use of Cloud Trace.

Practice this question →

8

MCQmedium

An application writes structured logs to Cloud Logging. The team wants to create a metric based on the value of a JSON field 'order_total' to alert when totals exceed $1000. What type of metric should they use?

A.Uptime check metric.

B.Log-based metric.

C.Error Reporting metric.

D.Custom metric from Cloud Monitoring agent.

AnswerB

Extracts 'order_total' from logs and creates a metric.

Why this answer

A log-based metric extracts a numeric value from a log entry's JSON payload using a regular expression or a label extractor. By defining a log-based metric on the 'order_total' field and setting an alert threshold of $1000, the team can monitor and alert on high-value orders directly from Cloud Logging without additional instrumentation.

Exam trap

Cisco often tests the distinction between log-based metrics and custom metrics from agents, where candidates mistakenly think a custom metric agent is required to extract values from logs, but Cloud Logging's built-in log-based metrics handle this directly without any agent.

How to eliminate wrong answers

Option A is wrong because uptime check metrics monitor the availability and response time of a URL or service, not the value of a field in structured logs. Option C is wrong because Error Reporting metrics are designed to count and group application errors (e.g., exceptions, stack traces), not to extract arbitrary numeric fields like 'order_total'. Option D is wrong because custom metrics from the Cloud Monitoring agent require installing and configuring the agent on a VM to collect system-level metrics (e.g., CPU, memory), not to parse log entries.

Practice this question →

9

MCQeasy

A team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance that runs a web server. The team wants to be notified if the instance's CPU utilization exceeds 80% for 5 minutes. Which threshold type should they use?

A.Ratio threshold

B.Metric threshold

C.MQL (Monitoring Query Language)

D.Forecast threshold

AnswerB

Correct: a metric threshold directly checks if a metric exceeds a set value over a duration.

Why this answer

Option B is correct because a metric threshold alerting policy directly monitors a numeric metric (e.g., CPU utilization) and triggers when the value exceeds a defined threshold (80%) for a specified duration (5 minutes). This is the standard approach for simple threshold-based alerts on a single metric in Cloud Monitoring.

Exam trap

Cisco often tests the distinction between simple metric thresholds and more advanced options like MQL or forecast thresholds, tempting candidates to overcomplicate the solution when a basic metric threshold is sufficient.

How to eliminate wrong answers

Option A is wrong because a ratio threshold is used for comparing two metrics (e.g., errors per request), not for a single metric like CPU utilization. Option C is wrong because MQL is a powerful query language for complex, multi-metric or time-shifted analysis, but it is overkill and unnecessary for a simple static threshold on one metric. Option D is wrong because a forecast threshold predicts future metric values based on historical trends, not for detecting current or recent breaches of a fixed threshold.

Practice this question →

10

MCQeasy

An application deployed on Google Kubernetes Engine (GKE) is experiencing intermittent high latency. The operations team wants to quickly identify which specific code path is causing the delay. What should they use?

A.Enable Cloud Trace and analyze trace spans.

B.Use Cloud Profiler to identify memory leaks.

C.Set up a Cloud Monitoring uptime check.

D.Review Cloud Logging logs to find error messages.

AnswerA

Cloud Trace captures request spans and shows time spent in each component.

Why this answer

Cloud Trace is designed specifically for latency analysis in distributed systems like GKE. It captures end-to-end request latency and breaks it down into individual spans, each representing a specific code path or service call. By analyzing these spans, the operations team can pinpoint which exact code path (e.g., a database query, external API call, or internal function) is causing the intermittent high latency.

Exam trap

Cisco often tests the distinction between tools that measure latency (Cloud Trace) versus tools that measure resource utilization (Cloud Profiler) or availability (Cloud Monitoring uptime checks), leading candidates to confuse profiling with tracing.

How to eliminate wrong answers

Option B is wrong because Cloud Profiler identifies performance bottlenecks related to CPU and memory usage (e.g., memory leaks, hot functions), not intermittent latency caused by specific code paths. Option C is wrong because a Cloud Monitoring uptime check only verifies that the application is reachable and responding within a configured timeout; it does not provide granular latency breakdowns per code path. Option D is wrong because reviewing Cloud Logging logs for error messages would only surface failures or exceptions, not the normal but slow execution paths that cause intermittent high latency.

Practice this question →

11

MCQeasy

A web application hosted on Compute Engine is experiencing slow response times during peak hours. Which Cloud Monitoring metric should be examined first to identify the bottleneck?

A.CPU utilization of backend instances

B.Number of incoming requests per second

C.Memory usage of backend instances

D.95th percentile request latency measured by Cloud Load Balancing

AnswerD

This metric directly measures user-facing response time, and a high latency indicates a performance issue that needs investigation.

Why this answer

The 95th percentile request latency measured by Cloud Load Balancing is the most direct indicator of user-perceived performance degradation. High latency at the load balancer level captures the end-to-end response time, including network, backend processing, and queuing delays, making it the first metric to examine when diagnosing slow response times during peak hours.

Exam trap

Cisco often tests the distinction between resource utilization metrics (CPU, memory) and performance metrics (latency), trapping candidates who assume high CPU or memory is always the root cause of slow response times, when in fact latency metrics provide the direct measure of user experience.

How to eliminate wrong answers

Option A is wrong because CPU utilization alone does not capture network latency, queuing delays, or application-level bottlenecks; a backend can have low CPU but still be slow due to I/O waits or database contention. Option B is wrong because the number of incoming requests per second measures throughput, not latency; high request volume can cause slowdowns, but latency is the direct symptom of the bottleneck. Option C is wrong because memory usage is a resource metric that may indicate swapping or OOM risks, but it is not the primary indicator of response time issues; a system can have ample memory yet still experience high latency due to other factors.

Practice this question →

12

MCQhard

A company uses Cloud Monitoring with custom metrics. They have a custom metric called 'requests_total' with labels 'endpoint', 'status_code'. They want to create an alert that fires if the error rate (status_code >=500) for any endpoint exceeds 5% over a 5-minute window. Which MQL query should they use?

A.fetch custom::requests_total | { filter status_code >= 500 ; group_by [endpoint], sum() } / { group_by [endpoint], sum() } | condition gt 0.05

B.fetch custom::requests_total | filter status_code < 500 | ratio | condition gt 0.05

C.fetch custom::requests_total | group_by [endpoint], sum() | filter status_code >= 500 | ratio | condition gt 0.05

D.fetch custom::requests_total | filter status_code >= 500 | ratio | condition gt 0.05

AnswerA

Correct: groups errors and total by endpoint, divides, and applies condition.

Why this answer

Option A is correct because it first filters for error responses (status_code >= 500), then groups by endpoint and sums the error count, and divides that by the total count per endpoint (also grouped and summed). This computes the error rate per endpoint, and the condition fires when that rate exceeds 0.05 (5%) over the 5-minute window. The use of two separate group_by operations within a join (the `{ ... } / { ... }` syntax) is the correct MQL pattern for calculating a ratio per label.

Exam trap

Cisco often tests the distinction between `ratio` (which operates on the number of time series) and explicit division with group_by (which operates on metric values per label), leading candidates to incorrectly choose a `ratio`-based query that ignores per-endpoint grouping.

How to eliminate wrong answers

Option B is wrong because it filters for status_code < 500 (successes) instead of errors, and uses `ratio` without the proper group_by to compute per-endpoint rates, which would produce an overall ratio across all endpoints. Option C is wrong because it applies `group_by [endpoint], sum()` before filtering for errors, which sums all requests first and then filters, making it impossible to compute a per-endpoint error rate correctly. Option D is wrong because it uses `ratio` without any group_by, which would compute the overall error rate across all endpoints combined, not per endpoint as required.

Practice this question →

13

MCQmedium

A company uses Cloud Run for a serverless application. They notice that cold starts are causing high latency for some requests. What is the best strategy to reduce cold starts?

A.Increase the max instances setting

B.Set a minimum number of instances to keep containers always warm

C.Migrate the application to Cloud Functions

D.Reduce the container concurrency setting

AnswerB

Min instances ensures pre-warmed containers are always ready.

Why this answer

Option B is correct because setting a minimum number of instances ensures that Cloud Run keeps a baseline of container instances always warm and ready to serve requests. This eliminates cold starts for the first requests that hit those pre-warmed instances, directly addressing the latency issue. Cloud Run automatically scales to zero when idle, but a minimum instance setting overrides that behavior for the specified number of containers.

Exam trap

The trap here is that candidates often confuse 'max instances' with 'min instances,' thinking that raising the upper limit will somehow pre-warm containers, when in fact it only controls the ceiling for scaling out, not the floor for keeping instances alive.

How to eliminate wrong answers

Option A is wrong because increasing the max instances setting only raises the upper scaling limit, which does nothing to prevent cold starts; it can actually increase the number of cold starts if traffic spikes cause new instances to be created. Option C is wrong because migrating to Cloud Functions does not inherently solve cold starts—Cloud Functions also has cold start latency, and the underlying infrastructure is similar; the recommendation would be the same (set a minimum instance count). Option D is wrong because reducing the container concurrency setting limits how many concurrent requests a single container can handle, which may force more instances to be created (increasing cold starts) rather than reducing them.

Practice this question →

14

MCQmedium

An application running on GKE is experiencing high latency. The team uses Cloud Trace to identify the bottleneck. They notice that a particular service spends most of its time waiting on a database query. How can they optimize performance?

A.Decrease the number of pods to reduce load

B.Use Cloud CDN to cache database results

C.Optimize the database query and add appropriate indexes

D.Increase the number of replicas for the service

AnswerC

Query optimization reduces execution time.

Why this answer

Option C is correct because the bottleneck is identified as a database query causing high latency. Optimizing the query and adding appropriate indexes directly reduces the time spent waiting on the database, which is the root cause. Cloud Trace shows the service is waiting on the database, so improving database performance is the most effective solution.

Exam trap

Google Cloud often tests the misconception that scaling horizontally (adding replicas) solves all performance issues, but here the bottleneck is external to the service (database), so scaling the service does not reduce the per-query wait time.

How to eliminate wrong answers

Option A is wrong because decreasing the number of pods reduces concurrency and can increase latency under load, not decrease it. Option B is wrong because Cloud CDN caches static content at edge locations, not dynamic database query results, and cannot cache database responses that are unique per request. Option D is wrong because increasing replicas spreads the load but does not address the database query latency; the service will still wait the same amount of time per query, and may even increase database contention.

Practice this question →

15

MCQmedium

Your team manages a service that receives thousands of requests per second. They have set up Cloud Monitoring alerting based on the 99th percentile latency. Recently, they received an alert warning that latency exceeded 1 second, but after investigating, they found it was a false alarm caused by a single very slow request. How can they improve their alert to reduce false positives?

A.Set the alert to fire only if the condition persists for a longer duration.

B.Use a log-based metric instead of latency.

C.Increase the alerting threshold to 2 seconds.

D.Use a different latency metric like median or 95th percentile.

AnswerD

Lower percentiles are less sensitive to outliers, reducing false alarms while still capturing most user experience.

Why this answer

The 99th percentile is sensitive to outliers; switching to a lower percentile like the 95th or median reduces the impact of rare slow requests and provides a more stable indicator of typical performance.

Practice this question →

16

MCQmedium

A team wants to monitor CPU utilization on their Compute Engine instances. They need an alert that sends a notification when the average CPU utilization across all instances in a project exceeds 80% for more than 5 minutes. Which alerting configuration should they use?

A.Use Cloud Scheduler to periodically check CPU and trigger notification

B.Create a log-based alert using metrics from Cloud Logging

C.Use an uptime check to monitor CPU utilization

D.Create an alert policy with a metric threshold condition for compute.googleapis.com/instance/cpu/utilization, aggregated across all instances with alignment period 1 min and duration 5 min

AnswerD

This correctly sets up a threshold alert on CPU utilization.

Why this answer

Option D is correct because Cloud Monitoring alert policies allow you to define a metric threshold condition using the `compute.googleapis.com/instance/cpu/utilization` metric, aggregate it across all instances in the project, and set an alignment period of 1 minute with a duration of 5 minutes. This configuration ensures the alert fires only when the average CPU utilization exceeds 80% for a sustained period of 5 minutes, meeting the exact requirement.

Exam trap

The trap here is that candidates confuse log-based alerts (which work on log entries) with metric-based alerts (which work on numeric time-series data), leading them to incorrectly choose Option B.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler is a cron job service for triggering actions on a schedule, not a monitoring or alerting tool; it cannot natively evaluate metric thresholds or aggregate CPU utilization across instances. Option B is wrong because log-based alerts are designed for log entries, not for numeric metric thresholds like CPU utilization; they cannot directly monitor `compute.googleapis.com/instance/cpu/utilization` as a metric. Option C is wrong because uptime checks monitor HTTP/HTTPS/TCP endpoint availability and response, not CPU utilization metrics; they are used for service health, not infrastructure resource usage.

Practice this question →

17

Multi-Selecthard

Which THREE are valid ways to create custom metrics in Cloud Monitoring? (Select exactly 3.)

Select 3 answers

A.Use the Cloud Billing pricing calculator to estimate metric costs.

B.Use the Cloud Monitoring API to write time series directly.

C.Install the Cloud Monitoring agent and configure custom metrics in its configuration file.

D.Define a log-based metric in Cloud Logging based on log content.

E.Deploy Ops Agent with default configuration.

AnswersB, C, D

Allows programmatic metric creation.

Why this answer

Option B is correct because the Cloud Monitoring API allows you to write time series data directly via the `projects.timeSeries.create` method, which is a primary mechanism for ingesting custom metrics. This enables you to programmatically send metric data from any source, bypassing the need for an agent.

Exam trap

Cisco often tests the distinction between agent-based collection of predefined metrics (Ops Agent default) and the explicit creation of custom metrics via API or log-based definitions, leading candidates to mistakenly select the Ops Agent default as a valid method for custom metrics.

Practice this question →

18

Multi-Selectmedium

A developer wants to automatically detect and capture application errors in a production environment on Google Cloud. Which two Google Cloud services should be enabled? (Choose two.)

Select 2 answers

A.Cloud Error Reporting

B.Cloud Trace

C.Cloud Profiler

D.Cloud Debugger

E.Cloud Logging

AnswersA, E

Cloud Error Reporting automatically detects and groups application errors.

Why this answer

Cloud Error Reporting aggregates and displays application errors in real time, allowing developers to automatically detect and capture errors in production. Cloud Logging stores all application logs, which Error Reporting uses as a source to identify and analyze error events. Together, they provide a complete solution for error detection and capture without manual intervention.

Exam trap

Cisco often tests the distinction between monitoring (Error Reporting, Logging) and debugging/tracing tools (Debugger, Trace, Profiler), leading candidates to select Debugger or Trace for error detection when they are designed for different purposes.

Practice this question →

19

MCQmedium

An application uses Cloud SQL and is experiencing slow query performance. The team wants to monitor query latency and identify slow queries. Which Google Cloud tool should they use?

A.Cloud SQL Insights

B.Cloud Debugger

C.Cloud Monitoring

D.Cloud Trace

AnswerA

Cloud SQL Insights is designed for query performance monitoring.

Why this answer

Cloud SQL Insights is the correct tool because it is specifically designed to provide detailed query performance diagnostics for Cloud SQL databases. It captures query latency, execution plans, and wait events, enabling teams to identify and troubleshoot slow queries directly within the Cloud SQL console without additional configuration.

Exam trap

The trap here is that candidates often confuse Cloud Trace (which traces request latency across services) with database query tracing, but Cloud Trace does not provide per-query execution plans or database-specific wait events, making Cloud SQL Insights the only tool that directly addresses slow query identification in Cloud SQL.

How to eliminate wrong answers

Option B (Cloud Debugger) is wrong because it is used for inspecting the state of a running application (e.g., capturing variable values and stack traces) in production, not for monitoring database query latency. Option C (Cloud Monitoring) is wrong because while it can collect metrics and set alerts for Cloud SQL, it does not provide per-query latency breakdowns or execution plan analysis; it is a general monitoring tool, not a query-specific diagnostic tool. Option D (Cloud Trace) is wrong because it focuses on end-to-end request latency across distributed services (e.g., HTTP requests), not on individual database query performance within Cloud SQL.

Practice this question →

20

MCQhard

An application running on Compute Engine generates structured logs. The operations team needs to parse a specific field from the logs and create a metric that counts occurrences of a particular value. They want the metric to be available for alerting with minimal delay. What should they do?

A.Export logs to BigQuery and use scheduled queries

B.Write a Cloud Function to process logs from Pub/Sub

C.Create a log-based metric in Cloud Logging

D.Use the Cloud Monitoring agent to collect logs

AnswerC

Log-based metrics are designed for this use case and provide low-latency metrics.

Why this answer

Log-based metrics in Cloud Logging are designed to extract specific fields from structured logs and count occurrences of particular values with near-real-time latency, making them ideal for alerting with minimal delay. They are natively integrated with Cloud Monitoring, so the metric is automatically available for alerting policies without additional infrastructure or data movement.

Exam trap

Cisco often tests the distinction between log-based metrics (native, low-latency) and log export to external systems (higher latency, more complex), tempting candidates to choose BigQuery or Pub/Sub because they seem more powerful for analysis, but they are not optimal for real-time alerting.

How to eliminate wrong answers

Option A is wrong because exporting logs to BigQuery and using scheduled queries introduces significant latency (minutes to hours) due to export batching and query scheduling, which is unsuitable for alerting with minimal delay. Option B is wrong because writing a Cloud Function to process logs from Pub/Sub adds unnecessary complexity, latency, and cost; Cloud Functions are event-driven but still require setting up a Pub/Sub sink and custom code, whereas log-based metrics provide a simpler, native solution with lower overhead. Option D is wrong because the Cloud Monitoring agent collects metrics from VM instances, not logs; it cannot parse structured log fields or create count-based metrics from log content.

Practice this question →

21

MCQmedium

A team uses Cloud Endpoints to manage their API. They want to monitor API latency for each API method. What is the recommended approach?

A.Parse Cloud Logging endpoint logs to calculate latency.

B.Use Cloud Trace to analyze samples and estimate latency.

C.Instrument the API code with a custom metric for each method.

D.View the built-in Cloud Endpoints latency metrics in Cloud Monitoring.

AnswerD

Endpoints exports per-method latency metrics automatically.

Why this answer

Cloud Endpoints automatically sends metrics including request latency per method to Cloud Monitoring. Cloud Trace can trace individual requests but not aggregate per method easily. Custom metrics require code changes.

Cloud Logging latency is not built-in.

Practice this question →

22

MCQmedium

Your team manages a serverless application deployed on Cloud Run. The application processes image uploads and stores metadata in Firestore. You have set up a Cloud Monitoring alert based on the 'request_count' metric for the Cloud Run service. The alert triggers when the request count exceeds 1000 requests per minute. Recently, the alert has been firing frequently, but the team notices that the application is performing well and there are no errors. The team is concerned about alert fatigue. You review the metric and notice that the request count metric is based on all HTTP requests, including health checks from the Cloud Run system. The health check requests account for about 30% of the total requests. What should you do to reduce unnecessary alerts while still monitoring real user traffic?

A.Increase the alert threshold to 1500 requests per minute

B.Create a new log-based metric that filters out health check requests, and use that in the alert

C.Disable health checks on the Cloud Run service

D.Configure the existing metric to exclude health check logs

AnswerB

This metric will only count user requests, reducing noise.

Why this answer

Option B is correct because creating a new log-based metric that filters out health check requests allows you to monitor only real user traffic. Cloud Run's system health checks (e.g., from the Cloud Run infrastructure) are included in the default 'request_count' metric, inflating the count. By using a log-based metric with a filter that excludes these health check requests, you can set an accurate alert threshold based on actual user demand, reducing alert fatigue without losing visibility into real issues.

Exam trap

Cisco often tests the misconception that you can modify built-in metrics or that simply adjusting thresholds is sufficient, when in reality you must create a custom metric to filter out noise like health checks.

How to eliminate wrong answers

Option A is wrong because simply increasing the threshold to 1500 requests per minute does not address the root cause—health check requests are still included, and the threshold may still be exceeded by a combination of real traffic and health checks, or it may be too high to detect real traffic spikes. Option C is wrong because disabling health checks on Cloud Run is not recommended; health checks are essential for ensuring the service is healthy and for routing traffic correctly, and disabling them could cause the service to be marked unhealthy or stop receiving traffic. Option D is wrong because the existing 'request_count' metric is a built-in metric that cannot be configured to exclude specific logs; you must create a new custom log-based metric with a filter to exclude health check requests.

Practice this question →

23

MCQeasy

A development team is using Cloud Monitoring to set up an alerting policy for a Compute Engine instance. They want to be notified when the instance's CPU utilization exceeds 80% for at least 5 minutes. Which alerting policy configuration should they use?

A.Condition type: Metric Threshold, Trigger: For 5 minutes, Threshold: 80%

B.Condition type: Metric Threshold, Trigger: For most recent value, Threshold: 80%

C.Condition type: Change Rate, Trigger: For 5 minutes, Threshold: 80%

D.Condition type: Metric Absence, Duration: 5 minutes

AnswerA

Triggers when condition holds for 5 minutes.

Why this answer

Option A is correct because Cloud Monitoring alerting policies use a Metric Threshold condition type to evaluate a metric against a static threshold. Setting the trigger to 'For 5 minutes' ensures the condition is met only when the CPU utilization exceeds 80% consistently over the specified duration, preventing false alarms from transient spikes.

Exam trap

Cisco often tests the distinction between 'For most recent value' and 'For X minutes' triggers, where candidates mistakenly choose the single-point trigger thinking it's simpler, missing the requirement for sustained threshold crossing.

How to eliminate wrong answers

Option B is wrong because 'For most recent value' triggers an alert based on a single data point, which would fire on any momentary spike above 80% rather than requiring sustained high utilization for 5 minutes. Option C is wrong because 'Change Rate' condition type measures the rate of change of a metric over time, not a static threshold; it is used for detecting anomalies in trends, not for fixed CPU utilization limits. Option D is wrong because 'Metric Absence' condition type triggers when data is missing for a specified duration, not when a metric exceeds a threshold; it is designed for detecting data gaps, not high CPU usage.

Practice this question →

24

MCQhard

You are designing a monitoring strategy for a microservices architecture running on GKE. Each service emits custom business metrics (e.g., order processing time). You want to create a dashboard that shows the 99th percentile latency for each service over the last 7 days. Which approach should you take?

A.Export logs to Cloud Logging and use Log Analytics to compute percentiles.

B.Write custom metrics to Cloud Monitoring and create a dashboard with the 99th percentile aligner.

C.Use Metrics Explorer to view the metrics and manually compute percentiles.

D.Use Prometheus monitoring built into GKE and query the avg() function.

AnswerB

Cloud Monitoring custom metrics support percentile aligners like 99th.

Why this answer

Option B is correct because Cloud Monitoring supports custom metrics and provides built-in aligners, including a 99th percentile aligner, which can be applied directly in a dashboard chart. This allows you to compute the 99th percentile latency for each service over the last 7 days without manual calculation or exporting logs. Custom metrics are the appropriate mechanism for business metrics like order processing time, as they are designed for numeric time-series data.

Exam trap

Cisco often tests the distinction between logs and metrics, and the trap here is that candidates may think exporting logs to Cloud Logging is a valid way to compute percentiles, overlooking that Cloud Monitoring is the correct service for numeric time-series data and provides native percentile computation.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for log data, not numeric time-series metrics; computing percentiles from logs requires parsing and aggregation, which is inefficient and not the intended use case. Option C is wrong because Metrics Explorer allows you to view and chart metrics, but it does not provide a built-in function to compute percentiles; you would have to export the data and calculate manually, which is not a scalable or recommended approach. Option D is wrong because Prometheus's avg() function computes the average, not the 99th percentile, and while Prometheus can be used with GKE, the question specifies using Cloud Monitoring's native capabilities for a dashboard.

Practice this question →

25

Multi-Selecthard

A DevOps team wants to set up custom metrics for a serverless application running on Cloud Run. The application emits metrics using OpenTelemetry. They need to collect these metrics and create an alerting policy that triggers when the 99th percentile latency exceeds 500ms for 5 minutes. Which TWO actions must they take? (Choose two.)

Select 2 answers

A.Create a custom distribution metric for the latency data and set up a metric threshold alert using the 99th percentile value.

B.Deploy the OpenTelemetry Collector as a sidecar or external service and configure it to export metrics to Cloud Monitoring using the Cloud Monitoring exporter.

C.Install the Cloud Monitoring agent on the Cloud Run instance to collect custom metrics.

D.Define a log-based metric from the application logs that captures latency entries.

E.Configure the Cloud Monitoring dashboard to query the metrics using PromQL.

AnswersA, B

Distribution metrics support percentile calculations in alert policies.

Why this answer

Option A is correct because to alert on the 99th percentile of latency, you must create a custom distribution metric, which stores a histogram of values and allows percentile calculations. A metric-threshold alert policy can then be configured to evaluate the 99th percentile value against the 500ms threshold over a 5-minute window.

Exam trap

Cisco often tests the misconception that log-based metrics can replace custom distribution metrics for percentile alerts, but logs lack the histogram structure required for precise percentile calculations.

Practice this question →

26

MCQhard

A company wants to create an SLO for their API with a target of 99.9% availability over a 30-day rolling window. They are using Cloud Monitoring. Which combination of resources and techniques should they use?

A.Manually compute availability using external monitoring tools.

B.Use the Cloud Monitoring SLO service with a request latency SLI.

C.Create an uptime check and a log-based metric for errors. Use the SLI formula: (successful requests / total requests).

D.Use Cloud Trace to measure latency and create a custom metric.

AnswerC

This leverages native Cloud Monitoring SLO capabilities, defining availability as the fraction of successful probes or requests, and automatically tracks the SLO over a rolling window.

Why this answer

Option C is correct because it combines an uptime check (to measure total requests) with a log-based metric for errors (to count failed requests), allowing the SLI formula (successful requests / total requests) to compute availability. This approach directly aligns with the 99.9% availability target over a 30-day rolling window, using Cloud Monitoring's native capabilities without external tools or irrelevant latency metrics.

Exam trap

Cisco often tests the distinction between availability and latency SLIs, so the trap here is assuming that any monitoring metric (like latency) can be used for an availability SLO, when in fact availability requires a success/failure ratio, not a performance threshold.

How to eliminate wrong answers

Option A is wrong because manually computing availability using external monitoring tools bypasses Cloud Monitoring's built-in SLO service, which is designed to automate SLI calculation and alerting, and introduces unnecessary manual effort and potential inconsistency. Option B is wrong because a request latency SLI measures response time, not availability; availability is about whether requests succeed or fail, not how fast they respond, so this SLI does not match the 99.9% availability target. Option D is wrong because Cloud Trace is a distributed tracing tool for analyzing latency and request flows, not for counting successful vs. total requests; using it to create a custom metric for availability would be inefficient and misaligned with the purpose of the service.

Practice this question →

27

MCQhard

Your company runs a multi-tier application on Compute Engine with a Cloud SQL backend. Recently, during peak hours, users report slow page loads. Cloud Monitoring shows high CPU on the app servers, but no memory pressure. Cloud Trace shows that the application spends most of its time waiting for database queries. The Cloud SQL instance is a high-memory machine type with 16 vCPUs and 64 GB RAM, but CPU utilization on the database is only 30%. There are no slow query alerts. What is the most likely cause and what should you do?

A.The database lacks indexes. Use Cloud SQL Query Insights to identify missing indexes.

B.The application is performing unnecessary queries. Add caching with Memorystore.

C.The database connection pool is exhausted. Increase the maximum number of connections.

D.The Cloud SQL instance is under-provisioned. Upgrade to a larger machine type.

AnswerA

Missing indexes force full table scans, causing slow queries. Query Insights can reveal the specific slow queries and suggest indexes.

Why this answer

The symptoms—high app server CPU, low database CPU, and queries consuming most of the application’s wait time—point to inefficient queries due to missing indexes. Cloud SQL Query Insights can identify these missing indexes by analyzing query execution plans and wait events. Adding appropriate indexes reduces query execution time, lowering app server CPU usage and resolving the slow page loads.

Exam trap

Cisco often tests the misconception that high app server CPU always means the app server is the bottleneck, when in fact the CPU is consumed waiting for slow database queries caused by missing indexes.

How to eliminate wrong answers

Option B is wrong because the application is already waiting on database queries, not performing unnecessary queries; caching would mask the underlying indexing issue but not fix the root cause. Option C is wrong because connection pool exhaustion would cause connection timeouts or errors, not high app server CPU and low database CPU; Cloud SQL’s 30% CPU utilization indicates connections are not saturated. Option D is wrong because the database CPU is only 30% utilized, so the instance is not under-provisioned; upgrading would not address the query performance bottleneck.

Practice this question →

28

MCQeasy

A developer wants to receive notifications when the error rate of their application exceeds 1% over a 5-minute window. What should they create in Cloud Monitoring?

A.Alerting policy with metric threshold condition

B.Log-based metric

C.Dashboard with error rate chart

D.Uptime check

AnswerA

Alerting policies evaluate metrics and send notifications.

Why this answer

An alerting policy with a metric threshold condition is the correct approach because Cloud Monitoring evaluates a metric (e.g., error rate) against a threshold (1%) over a specified window (5 minutes) and triggers a notification when the condition is met. This directly fulfills the requirement to be notified when the error rate exceeds the threshold, as alerting policies are designed for proactive notification based on metric data.

Exam trap

Cisco often tests the distinction between alerting policies (which trigger notifications) and other monitoring components like dashboards or log-based metrics, so candidates mistakenly choose a log-based metric or dashboard because they confuse data collection with alerting.

How to eliminate wrong answers

Option B is wrong because a log-based metric is used to extract quantitative data from logs (e.g., count of error log entries) but does not itself trigger notifications; it must be used within an alerting policy to generate alerts. Option C is wrong because a dashboard with an error rate chart provides a visual representation of the metric but does not generate notifications or alerts; it is a passive monitoring tool. Option D is wrong because an uptime check monitors the availability and responsiveness of a resource (e.g., HTTP response codes) and is not designed to track application error rates or trigger alerts based on a percentage threshold over a time window.

Practice this question →

29

MCQeasy

A company runs a stateless application on Compute Engine behind a load balancer. They want to monitor the number of active requests per instance without adding custom instrumentation. What is the most straightforward approach?

A.Configure the Cloud Monitoring agent to collect request metrics.

B.Install the Cloud Logging agent and parse access logs.

C.Deploy Prometheus and instrument the application.

D.Use the load balancer's built-in 'request_count' metric.

AnswerD

This metric is available without additional agents.

Why this answer

Option D is correct because the load balancer's built-in 'request_count' metric directly provides the number of active requests per instance without requiring any additional instrumentation or agents. This metric is automatically collected by Cloud Monitoring for Google Cloud HTTP(S) load balancers, making it the most straightforward approach for a stateless application on Compute Engine.

Exam trap

Cisco often tests the distinction between agent-based monitoring (Cloud Monitoring agent) and built-in managed service metrics (load balancer metrics), where candidates mistakenly assume an agent is required for any application-level metric, ignoring that Google Cloud's managed services automatically expose relevant metrics.

How to eliminate wrong answers

Option A is wrong because the Cloud Monitoring agent collects system-level metrics (CPU, memory, disk) from VM instances, not application-level request counts; it cannot capture active request counts without custom instrumentation. Option B is wrong because installing the Cloud Logging agent and parsing access logs would require additional log-based metric configuration and processing, which is less straightforward than using the built-in load balancer metric. Option C is wrong because deploying Prometheus and instrumenting the application introduces significant complexity and custom code, which contradicts the requirement of 'without adding custom instrumentation'.

Practice this question →

30

MCQmedium

Your application writes structured logs to Cloud Logging. You want to create a metric that counts log entries with a specific severity level, then alert when the count exceeds a threshold. What should you do?

A.Use Cloud Monitoring's custom metrics API to write the count.

B.Export logs to BigQuery and analyze there.

C.Create a log-based metric using the Logs Explorer, then set up an alerting policy.

D.Use Cloud Logging's metrics dashboard.

AnswerC

Logs Explorer allows you to define a metric from a query (e.g., count of 'ERROR' severity), which then becomes available in Cloud Monitoring for alerting.

Why this answer

Option C is correct because log-based metrics in Cloud Logging allow you to define a counter metric based on log entries matching a filter (e.g., severity=ERROR). Once the metric is created, you can set up an alerting policy in Cloud Monitoring to trigger when the count exceeds a threshold. This approach is native, serverless, and requires no custom code or external exports.

Exam trap

Cisco often tests the distinction between viewing metrics (dashboards) and creating actionable metrics (log-based metrics with alerting), leading candidates to mistakenly choose the metrics dashboard option (D) instead of the correct creation workflow (C).

How to eliminate wrong answers

Option A is wrong because using Cloud Monitoring's custom metrics API would require you to write application code to manually increment a metric, which duplicates effort and bypasses the native log-based metric functionality. Option B is wrong because exporting logs to BigQuery adds latency, cost, and complexity; it is not a real-time alerting solution and requires separate querying and monitoring setup. Option D is wrong because Cloud Logging's metrics dashboard only displays existing metrics; it does not allow you to create a new log-based metric or configure alerting policies.

Practice this question →

31

Multi-Selectmedium

Which TWO capabilities does Cloud Service Mesh (Istio) provide to help monitor application performance? (Select exactly 2.)

Select 2 answers

A.Legacy Cloud Logging agent integration for container logs.

B.Custom Prometheus exporter deployment for each microservice.

C.Automatic generation of HTTP request metrics (e.g., request count, latency, error rate) per service.

D.Cloud Endpoints API management with key validation.

E.Distributed tracing propagation and span generation without application changes.

AnswersC, E

Collects metrics for each service proxy.

Why this answer

Option C is correct because Cloud Service Mesh (Istio) automatically generates HTTP request metrics such as request count, latency, and error rate for every service in the mesh. This is achieved through Envoy sidecar proxies that intercept all traffic and export standardized telemetry without requiring any application code changes.

Exam trap

Cisco often tests the distinction between automatic telemetry generation (Istio's built-in Prometheus and tracing) versus manual instrumentation or separate API management tools, leading candidates to confuse Cloud Endpoints or custom exporters with Istio's native capabilities.

Practice this question →

32

MCQmedium

The alert is not firing even though error_count metric occasionally spikes above 10. What is the most likely reason?

A.The aggregations are incorrect; should use REDUCE_MAX.

B.The filter specifies gke_container but the metric might be from other resources.

C.The duration of 300s means the condition must remain >10 for 5 minutes, so brief spikes do not trigger.

D.The comparison should be COMPARISON_GT_OR_NAN.

AnswerC

The duration parameter requires the threshold to be exceeded continuously for 300 seconds.

Why this answer

Option C is correct because the alert condition is configured with a duration of 300 seconds (5 minutes), meaning the error_count metric must remain above 10 for the entire 5-minute window before the alert fires. Brief, transient spikes that exceed 10 but do not persist for the full duration will not trigger the alert, which is the most likely reason the alert is not firing despite occasional spikes.

Exam trap

Cisco often tests the distinction between 'threshold violation' and 'duration-based alerting' — candidates mistakenly think any breach of the threshold triggers an alert, but the duration parameter requires sustained violation over the specified window.

How to eliminate wrong answers

Option A is wrong because REDUCE_MAX is not a valid aggregation type in Google Cloud Monitoring; the correct aggregation for detecting spikes is typically REDUCE_MAX or REDUCE_COUNT, but the issue here is not about aggregation but about the duration window. Option B is wrong because the filter specifies gke_container, and if the metric were from other resources, the alert would simply not match any data, but the question states the metric occasionally spikes above 10, implying data is present. Option D is wrong because COMPARISON_GT_OR_NAN would treat missing data as exceeding the threshold, which could cause false positives, not prevent alerts from firing; the current comparison is likely COMPARISON_GT, which is correct for this scenario.

Practice this question →

33

MCQhard

A company runs a multi-service application on GKE and wants to create a Service Level Indicator (SLI) for request latency. They have set up Cloud Service Mesh (Anthos Service Mesh) with Istio. Which metric should they use for the SLI?

A.istio_request_duration_milliseconds_bucket metric from Cloud Monitoring.

B.Custom metric exported by the application using OpenTelemetry.

C.Cloud Trace latency distribution from traces.

D.Cloud HTTP Load Balancer latency metric.

AnswerA

Built-in Istio metric for latency SLI.

Why this answer

Option A is correct because `istio_request_duration_milliseconds_bucket` is a native Istio metric automatically exported by Cloud Service Mesh (Anthos Service Mesh) to Cloud Monitoring. It provides a histogram of request latencies, which is the standard data source for building a latency-based SLI (e.g., the proportion of requests under a threshold). This metric is pre-configured and requires no custom instrumentation, making it the most direct and reliable choice for an SLI in this environment.

Exam trap

The trap here is that candidates often confuse the load balancer latency metric (Option D) as the correct choice because it is a common SLI for external-facing services, but for a multi-service application inside GKE with Cloud Service Mesh, the correct metric must come from the service mesh itself to capture true request latency between services.

How to eliminate wrong answers

Option B is wrong because while custom metrics via OpenTelemetry can be used for SLIs, they require additional application-level instrumentation and are not the default or recommended approach when Cloud Service Mesh already provides the exact latency metric needed. Option C is wrong because Cloud Trace provides latency distributions from sampled traces, not a continuous, aggregated histogram suitable for a precise SLI calculation; it is designed for debugging, not for service-level monitoring. Option D is wrong because the Cloud HTTP Load Balancer metric measures latency at the load balancer level, which includes network overhead and does not reflect the actual request latency inside the GKE service mesh, leading to an inaccurate SLI.

Practice this question →

34

Multi-Selectmedium

Which THREE components are essential for a complete application performance monitoring (APM) solution on Google Cloud?

Select 3 answers

A.Cloud Scheduler for job scheduling.

B.Cloud Monitoring for metrics and alerting.

C.Cloud Trace for request tracing.

D.Cloud CDN for content caching.

E.Cloud Logging for log aggregation and analysis.

AnswersB, C, E

Core component for metrics and alerts.

Why this answer

Cloud Monitoring is essential for an APM solution because it provides metrics, dashboards, and alerting policies to track application health and performance. It integrates with other services like Cloud Trace and Cloud Logging to offer a unified observability platform, enabling proactive detection of issues such as latency spikes or error rate increases.

Exam trap

Cisco often tests the distinction between operational tools (like Cloud Scheduler or Cloud CDN) and observability tools, leading candidates to mistakenly include services that manage tasks or optimize delivery rather than monitor performance.

Practice this question →

35

Multi-Selecthard

Which TWO are correct ways to reduce logging costs in Google Cloud? (Choose two.)

Select 2 answers

A.Set log bucket retention to a shorter period

B.Disable all audit logs to reduce volume

C.Export all logs to BigQuery for analysis

D.Increase the retention period from 30 days to 365 days

E.Use exclusion filters to drop debug logs

AnswersA, E

Shorter retention reduces storage costs.

Why this answer

Option A is correct because reducing the retention period for log buckets directly decreases the amount of log data stored, which lowers storage costs in Cloud Logging. Logs are billed based on volume ingested and stored; shorter retention means older logs are deleted sooner, reducing the total storage footprint and associated charges.

Exam trap

Google Cloud often tests the misconception that exporting logs to an external system like BigQuery reduces costs, when in fact it adds additional costs for the export destination, and the trap is that candidates confuse 'analysis' with 'cost reduction'.

Practice this question →

36

MCQeasy

Your application is deployed on Google Kubernetes Engine (GKE). You want to monitor resource usage at the pod level. Which tool should you use?

A.Cloud Trace

B.Cloud Logging

C.Cloud Profiler

D.Cloud Monitoring with Kubernetes integration

AnswerD

Cloud Monitoring provides built-in dashboards and metrics for GKE, including pod-level resource metrics.

Why this answer

Cloud Monitoring with Kubernetes integration is the correct choice because it provides native pod-level metrics such as CPU, memory, disk, and network usage by leveraging the Kubernetes API and cAdvisor. This integration automatically collects resource utilization from each pod without requiring manual instrumentation, making it ideal for monitoring resource usage at the pod level in GKE.

Exam trap

Cisco often tests the distinction between monitoring (metrics) and observability tools (tracing, logging, profiling), so candidates may confuse Cloud Trace or Cloud Profiler as solutions for resource usage monitoring because they deal with performance data, but they do not provide pod-level resource metrics.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is a distributed tracing tool that captures latency data for requests across services, not resource usage metrics like CPU or memory at the pod level. Option B is wrong because Cloud Logging collects and stores log data (e.g., application logs, system logs), not numeric resource utilization metrics. Option C is wrong because Cloud Profiler is a continuous profiling tool that identifies performance bottlenecks in code (e.g., CPU or memory hot spots), but it does not provide real-time pod-level resource usage monitoring.

Practice this question →

37

Multi-Selectmedium

You are troubleshooting a performance issue in a microservices application. Which TWO tools from Google Cloud's operations suite would you use to trace a request across services and identify the slowest component?

Select 2 answers

A.Cloud Monitoring

B.Error Reporting

C.Cloud Profiler

D.Cloud Logging

E.Cloud Trace

AnswersA, E

Cloud Monitoring can display latency heatmaps and service graphs that help visualize the slowest component in a distributed trace.

Why this answer

Cloud Trace is the dedicated Google Cloud service for distributed tracing, capturing latency data as requests propagate through microservices. Cloud Monitoring provides the dashboards and alerting to visualize trace data and pinpoint the slowest component. Together, they enable end-to-end request tracing and performance bottleneck identification.

Exam trap

Cisco often tests the distinction between tools that monitor code performance (Profiler) versus tools that trace request flow (Trace), leading candidates to incorrectly select Cloud Profiler for tracing tasks.

Practice this question →

38

MCQmedium

Refer to the exhibit. A developer sees this log entry in Cloud Logging. The application is running on Compute Engine. Which tool should they use to further diagnose the cause of the connection refusal?

A.Cloud Monitoring to check network metrics.

B.Cloud Profiler to identify CPU bottlenecks.

C.Cloud Trace to trace the request flow.

D.VPC Flow Logs to analyze network traffic.

AnswerD

Correct: VPC Flow Logs capture connection metadata and can show whether traffic was accepted or denied.

Why this answer

The log entry indicates a connection refusal, which is a network-level issue. VPC Flow Logs capture metadata about network traffic to and from Compute Engine instances, including whether connections were accepted or rejected. By analyzing these logs, the developer can identify the source and destination IPs, ports, and protocol, and determine if a firewall rule or routing issue is causing the refusal.

Exam trap

Cisco often tests the distinction between application-level monitoring tools (Trace, Profiler) and network-level diagnostics (VPC Flow Logs), trapping candidates who confuse a connection refusal with a performance or code issue.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring provides metrics and alerts for resource utilization and performance, but it does not capture per-connection network traffic metadata needed to diagnose a connection refusal. Option B is wrong because Cloud Profiler is designed to identify CPU and memory bottlenecks in application code, not network connectivity issues. Option C is wrong because Cloud Trace traces request latency and flow through distributed services, but it does not log network-level connection refusals or firewall drops.

Practice this question →

39

MCQmedium

Refer to the exhibit. The alert fires when what happens?

A.When the rate of responses on App Engine exceeds 10 per second for 5 minutes

B.When the cumulative response count on App Engine exceeds 10 for 5 minutes

C.When the latency exceeds 10 seconds for 5 minutes

D.When the response rate drops below 10 per second for 5 minutes

AnswerA

ALIGN_RATE computes per-second rate, threshold >10, duration 300s.

Why this answer

The alert is configured to fire when the rate of responses on App Engine exceeds 10 per second for a sustained period of 5 minutes. This is a rate-based threshold, not a cumulative count or latency metric, which is why option A correctly describes the condition.

Exam trap

Cisco often tests the distinction between rate-based and cumulative-based thresholds, and the trap here is that candidates confuse 'rate per second' with 'total count over time' or misread the direction of the threshold (exceeding vs. dropping below).

How to eliminate wrong answers

Option B is wrong because it describes a cumulative response count exceeding 10 over 5 minutes, but the alert is based on a rate (per second), not a total count. Option C is wrong because it refers to latency exceeding 10 seconds, but the alert is triggered by response rate, not latency. Option D is wrong because it describes the response rate dropping below 10 per second, but the alert fires when the rate exceeds 10 per second, not when it drops below.

Practice this question →

40

MCQhard

A development team is using Cloud Trace to analyze performance bottlenecks in a Node.js application deployed on GKE. They have enabled trace sampling at 10% and can see some traces, but many requests are not captured. They want to increase the sampling rate to 100% for a specific high-traffic endpoint while keeping the default sampling rate for other endpoints. How can they achieve this?

A.Use a separate trace exporter for the high-traffic endpoint.

B.Increase the quota for trace spans per request.

C.Implement a custom sampler in the application code to sample the specific endpoint at 100%.

D.Set the global trace sampling rate to 100% in the application configuration.

AnswerC

A custom sampler allows per-endpoint sampling rates as needed.

Why this answer

Option C is correct because Cloud Trace allows you to implement a custom sampler in your application code to override the default sampling rate for specific endpoints. By using the OpenTelemetry SDK, you can create a sampler that checks the request path and returns a sampling decision of 1.0 (100%) for the high-traffic endpoint while delegating to the default sampler (e.g., 0.1) for all other requests. This gives you fine-grained control without affecting the global sampling configuration.

Exam trap

Cisco often tests the distinction between sampling rate configuration (which controls which requests are traced) and quota or exporter settings (which control data transmission limits), leading candidates to confuse increasing span quotas with increasing sampling probability.

How to eliminate wrong answers

Option A is wrong because using a separate trace exporter does not control sampling rate; exporters are responsible for sending trace data to the backend, not for deciding which spans to capture. Option B is wrong because increasing the quota for trace spans per request addresses limits on the number of spans that can be sent, not the sampling rate; it does not change the probability of capturing a request. Option D is wrong because setting the global trace sampling rate to 100% would capture all requests across all endpoints, which contradicts the requirement to keep the default sampling rate for other endpoints.

Practice this question →

41

MCQeasy

You deployed a new version of your application that uses Cloud Pub/Sub for asynchronous messaging. After deployment, you notice that messages are accumulating in the subscription backlog. You suspect the subscriber is too slow. Which tool should you use to diagnose?

A.Cloud Trace to trace message processing.

B.Cloud Monitoring to check subscriber's processing latency and throughput.

C.Cloud Logging to view subscriber logs.

D.Cloud Profiler to profile subscriber code.

AnswerB

Cloud Monitoring has built-in metrics for Pub/Sub subscriptions, including 'subscriber latency' and 'sent messages count', which can confirm if the subscriber is too slow.

Why this answer

Cloud Monitoring is the correct tool because it provides metrics such as subscriber processing latency, throughput, and backlog size for Pub/Sub subscriptions. By examining these metrics, you can quantify how slow the subscriber is and identify whether the issue is due to high latency or insufficient throughput, directly addressing the suspicion of a slow subscriber.

Exam trap

Cisco often tests the distinction between monitoring (metrics) and tracing (request paths) — the trap here is that candidates confuse Cloud Trace's ability to trace individual messages with Cloud Monitoring's ability to aggregate subscriber performance metrics, leading them to pick Cloud Trace instead of Cloud Monitoring.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for distributed tracing of request latency across services, not for monitoring Pub/Sub subscription backlog or subscriber processing metrics. Option C is wrong because Cloud Logging captures log entries from your application, but it does not provide the real-time performance metrics (like processing latency or throughput) needed to diagnose a slow subscriber. Option D is wrong because Cloud Profiler profiles CPU and memory usage of your code, but it does not directly measure Pub/Sub subscriber processing latency or backlog accumulation.

Practice this question →

42

MCQmedium

You need to create an uptime check for an external HTTPS endpoint and configure an alert that sends a notification if the check fails for 3 consecutive attempts. Which configuration is correct?

A.Create an uptime check with check interval 5 min and alert condition with duration 3 min

B.Create an uptime check with check interval 1 min and alert condition with duration 3 min

C.Create an uptime check with check interval 5 min and alert condition with downtime 15 min

D.Create an uptime check with check interval 1 min and alert condition with duration 1 min

AnswerB

1-minute interval with 3-minute duration means 3 consecutive failures trigger alert.

Why this answer

Option B is correct because to trigger an alert after 3 consecutive failures with a 1-minute check interval, the alert condition must have a duration of 3 minutes. This ensures that the alert fires only when the endpoint has been down for three successive checks, matching the requirement exactly.

Exam trap

Cisco often tests the distinction between 'duration' (the time window for consecutive failures) and 'downtime' (a different metric), leading candidates to confuse the alert condition parameter name or miscalculate the required duration for a given number of consecutive failures.

How to eliminate wrong answers

Option A is wrong because a 5-minute check interval with a 3-minute duration would only cover part of one check interval, not three consecutive failures; the alert would never trigger correctly. Option C is wrong because a 5-minute check interval with a 15-minute downtime condition would require three consecutive failures (3 × 5 = 15), but the term 'downtime' is not the correct parameter name in Google Cloud Monitoring—the correct term is 'duration'. Option D is wrong because a 1-minute check interval with a 1-minute duration would trigger after only one failure, not three consecutive attempts.

Practice this question →

43

MCQhard

Your company runs a production App Engine standard environment service (module 'frontend', version 'v2') that handles e-commerce checkout requests. You have set up an alerting policy on a custom metric 'request_latency' that fires when latency exceeds 500ms for 1 minute. Recently, customers have complained about slow checkout times, but no alert has fired. You examine the exhibit: the log entry shows a latency of 0.452s (452ms) for a request to '/api/checkout'. The custom metric is defined from OpenTelemetry instrumentation. What is the most likely reason the alert did not fire?

A.The alert condition uses a threshold on a metric that is not being written because the OpenTelemetry exporter is not configured for the 'frontend' module.

B.The log entry does not contain the required custom metric data because the httpRequest field is not parsed by Cloud Monitoring.

C.The alert threshold is 500ms, and the exhibited request latency is 452ms, which is below the threshold. Individual requests may be below the threshold, so the alert does not fire.

D.The custom metric is only emitted for version 'v1', and the current version is 'v2', so no metric data is available for the alert.

AnswerC

The log shows a single request below threshold; the alert requires exceeding for 1 minute.

Why this answer

Option C is correct because the alerting policy is configured to fire when the custom metric 'request_latency' exceeds 500ms for 1 minute. The exhibited log entry shows a latency of 452ms, which is below the 500ms threshold. The alert condition is based on a metric threshold, not individual log entries, and since the metric value remains below the threshold, the alert does not trigger.

Exam trap

Cisco often tests the distinction between individual log entries and aggregated metric thresholds, leading candidates to mistakenly assume that any request latency near the threshold should trigger an alert, when in fact the alert condition requires sustained violation over the evaluation window.

How to eliminate wrong answers

Option A is wrong because the OpenTelemetry exporter is correctly configured for the 'frontend' module, as evidenced by the custom metric data being present in the log entry (the latency value of 0.452s is recorded). Option B is wrong because the custom metric is defined from OpenTelemetry instrumentation, not from parsing the httpRequest field; Cloud Monitoring ingests the metric directly via the OpenTelemetry exporter, not by parsing log entries. Option D is wrong because the log entry explicitly shows the request was handled by version 'v2' (the exhibit shows 'module frontend, version v2'), and the custom metric is emitted for the current version, not only for 'v1'.

Practice this question →

44

MCQeasy

Refer to the exhibit. You are reviewing a Cloud Monitoring MQL query. What is the purpose of this query?

A.It displays the raw CPU utilization data points that exceed 90%.

B.It shows the 5-minute average CPU utilization for all instances, then filters out those with average > 90%.

C.It computes the 5-minute average of CPU utilization and then selects instances where any data point exceeded 90%.

D.It filters for instances with CPU utilization > 90% and then computes the 5-minute average.

AnswerD

Filter first, then align, as shown in the query order.

Why this answer

Option D is correct because the MQL query uses the `filter` clause to first select only time series where `cpu.utilization` exceeds 90%, and then applies the `avg` aggregation over a 5-minute window. This order of operations ensures that the average is computed only on the filtered data points, not on all instances.

Exam trap

Cisco often tests the order of operations in MQL queries, specifically whether the filter or aggregation is applied first, leading candidates to confuse the sequence and misinterpret the query's purpose.

How to eliminate wrong answers

Option A is wrong because the query does not display raw data points; it applies a 5-minute average aggregation. Option B is wrong because it incorrectly suggests that the average is computed first and then filtered, whereas MQL processes the filter before the aggregation. Option C is wrong because it describes selecting instances based on any data point exceeding 90%, but the filter in MQL applies to each data point in the time series, not to instances as a whole.

Practice this question →

45

MCQeasy

A company uses Cloud Logging to store application logs. They need to keep logs for 3 years for compliance. What is the most cost-effective way to store logs for this duration?

A.Use Cloud Logging's default retention

B.Create a sink to export logs to Pub/Sub

C.Create a sink to export logs to Cloud Storage with object lifecycle rules

D.Create a sink to export logs to BigQuery

AnswerC

Cloud Storage with lifecycle rules allows cost-effective long-term storage.

Why this answer

Cloud Logging's default retention is limited (e.g., 30 days for logs, with some exceptions up to 400 days), so it cannot meet a 3-year compliance requirement. Exporting logs to Cloud Storage and applying object lifecycle rules allows you to automatically transition objects to lower-cost storage classes (e.g., from Standard to Nearline, Coldline, or Archive) and delete them after the retention period, minimizing cost while meeting the 3-year retention need.

Exam trap

Cisco often tests the misconception that Cloud Logging's default retention can be extended indefinitely or that exporting to BigQuery is always the best for analytics, but the trap here is that long-term compliance storage requires a cost-optimized archival solution like Cloud Storage with lifecycle rules, not a query-optimized or streaming service.

How to eliminate wrong answers

Option A is wrong because Cloud Logging's default retention is typically 30 days (or up to 400 days for some log types), far short of the required 3 years, and cannot be extended to that duration without exporting. Option B is wrong because exporting to Pub/Sub is designed for real-time streaming and processing, not for long-term archival storage; Pub/Sub messages have a maximum retention of 7 days and are not cost-effective for 3-year retention. Option D is wrong because BigQuery is optimized for analytics and querying, not for long-term archival storage; storing logs in BigQuery for 3 years would incur significant storage and query costs, making it less cost-effective than Cloud Storage with lifecycle rules.

Practice this question →

46

MCQmedium

Your application running on Google Kubernetes Engine (GKE) is experiencing intermittent latency spikes. You have enabled Cloud Monitoring and Cloud Logging. Which approach would be MOST effective to identify the root cause?

A.Increase the number of replicas or switch to a larger machine type.

B.Use Cloud Trace to analyze distributed tracing data for slow requests.

C.Examine CPU and memory utilization metrics in Cloud Monitoring for the GKE cluster.

D.Review recent Cloud Logging entries for error messages.

AnswerB

Tracing reveals per-request latencies and bottlenecks.

Why this answer

Cloud Trace is the most effective tool for identifying intermittent latency spikes because it provides end-to-end distributed tracing, allowing you to pinpoint which specific service or request path is causing the delay. Unlike aggregate metrics or logs, Cloud Trace captures individual request spans and can reveal high-latency operations, such as slow database queries or external API calls, that occur only under certain conditions.

Exam trap

Cisco often tests the distinction between aggregate monitoring (metrics, logs) and distributed tracing, trapping candidates who assume that high CPU/memory or error logs are the only indicators of performance issues, when in fact intermittent latency spikes are best diagnosed with trace-level data that shows the exact request path and timing.

How to eliminate wrong answers

Option A is wrong because increasing replicas or switching to a larger machine type is a reactive scaling action that does not identify the root cause of latency spikes; it may mask the issue but not reveal whether the problem is due to a code bottleneck, a slow dependency, or resource contention. Option C is wrong because CPU and memory utilization metrics in Cloud Monitoring show aggregate resource usage, which may not correlate with intermittent latency spikes caused by a specific slow request or a transient external dependency; high latency can occur even when CPU and memory are well within limits. Option D is wrong because reviewing Cloud Logging entries for error messages may miss the root cause if the latency spike is due to a slow but non-error operation (e.g., a database query taking 5 seconds without throwing an error); logs alone lack the timing context and traceability to identify which specific request or service caused the delay.

Practice this question →

47

MCQhard

A company has a Cloud Run service that uses Cloud SQL. They notice that the number of database connections is increasing over time, causing connection pool exhaustion. They have enabled Cloud Monitoring and see a custom metric for active DB connections. To proactively alert when the connection count exceeds 80% of the maximum pool size (which is 100), which alerting approach is most efficient?

A.Create a metric threshold alert on the custom metric with condition > 80.

B.Create a forecast alert to predict when connections will exceed 80.

C.Create an alert on the Cloud SQL system metric for 'cloudsql.googleapis.com/database/connections/num_failed_reserved'.

D.Create a ratio alert using an MQL query that divides the active connections by the max connections and alerts when > 0.8.

AnswerD

Correct: ratio dynamically adjusts if max changes, and is a best practice.

Why this answer

Option D is correct because it creates a ratio alert using MQL to divide the active connections by the maximum pool size (100), triggering when the ratio exceeds 0.8 (80%). This directly measures the utilization of the connection pool, which is the most efficient way to alert on impending exhaustion. It avoids hardcoding a static threshold that would break if the pool size changes, and it uses the custom metric already being monitored.

Exam trap

Cisco often tests the distinction between static thresholds and ratio-based alerts, trapping candidates who choose a simple numeric threshold without considering maintainability or the need to normalize against the pool size.

How to eliminate wrong answers

Option A is wrong because a static threshold of >80 does not scale with the maximum pool size; if the pool size changes, the alert threshold must be manually updated, making it less maintainable. Option B is wrong because a forecast alert predicts future values, which is unnecessary here since the condition is a simple threshold on current utilization, and forecasting adds latency and complexity without benefit. Option C is wrong because 'cloudsql.googleapis.com/database/connections/num_failed_reserved' tracks failed reserved connections, not active connections, so it would not alert on the actual connection count approaching the pool limit.

Practice this question →

48

MCQmedium

A company has a legacy monolithic application running on Compute Engine that is being migrated to microservices on GKE. During the migration, they need to maintain performance monitoring across both environments. The legacy application uses Stackdriver Logging and Monitoring agents (now Ops Agent) and exports logs to Cloud Logging. The new microservices are instrumented with OpenTelemetry for traces and metrics. The team wants a unified view of performance across both environments, including distributed traces from the new services and log-based metrics from the legacy app. They also want to correlate logs and traces for troubleshooting. Which solution should they implement?

A.Keep monitoring separate and use separate dashboards for legacy and new.

B.Use a third-party APM tool that supports both environments.

C.Use Cloud Monitoring dashboards and ingest OpenTelemetry metrics into Cloud Monitoring, while using Cloud Logging log-based metrics from legacy app.

D.Rewrite the legacy app to use OpenTelemetry.

AnswerC

This approach unifies metrics and logs from both environments, enabling correlation.

Why this answer

Option C is correct because it provides a unified view by ingesting OpenTelemetry metrics into Cloud Monitoring and using Cloud Logging log-based metrics from the legacy app. Cloud Monitoring supports OpenTelemetry metrics via the OpenTelemetry Protocol (OTLP) and can correlate them with log-based metrics from the legacy app, enabling distributed tracing and log correlation in a single dashboard.

Exam trap

The trap here is that candidates may think rewriting the legacy app is necessary for unified monitoring, but Google Cloud's native support for OpenTelemetry and log-based metrics allows integration without code changes.

How to eliminate wrong answers

Option A is wrong because keeping monitoring separate defeats the goal of a unified view and correlation between logs and traces, which is essential for troubleshooting across environments. Option B is wrong because while a third-party APM tool could work, it introduces unnecessary complexity and cost, and the question specifically asks for a solution using existing Google Cloud tools (Cloud Monitoring and Cloud Logging). Option D is wrong because rewriting the legacy app to use OpenTelemetry is a significant engineering effort that may not be feasible or necessary; the legacy app already exports logs via the Ops Agent, which can be used for log-based metrics without modification.

Practice this question →

49

Multi-Selecteasy

A developer wants to profile their application's CPU and memory usage to identify performance bottlenecks. Which TWO Google Cloud services should they use?

Select 1 answer

A.Cloud Logging

B.Cloud Debugger

C.Cloud Profiler

D.Cloud Trace

E.Cloud Monitoring

AnswersC

Cloud Profiler provides CPU and heap profiling to identify bottlenecks.

Why this answer

Cloud Profiler (Option C) is the correct service for profiling CPU and memory usage because it continuously gathers and analyzes call stacks and resource consumption across your application, identifying the functions that consume the most resources. This directly addresses the developer's goal of pinpointing performance bottlenecks in CPU and memory.

Exam trap

The trap here is that candidates often confuse Cloud Monitoring (which shows VM-level CPU/memory metrics) with Cloud Profiler (which shows application-level function-by-function CPU/memory consumption), leading them to pick Cloud Monitoring instead of Cloud Profiler.

Practice this question →

50

MCQeasy

What is the first step to resolve this error?

A.Roll back the deployment.

B.Restart the service.

C.Increase memory for the service.

D.Add a null check on line 45.

AnswerD

This directly resolves the NullPointerException.

Why this answer

Option D is correct because the error is a NullReferenceException, which occurs when code attempts to access a member of a null object. Adding a null check on line 45 prevents the exception by ensuring the object is not null before use, which is the standard first step in debugging such runtime errors in managed code environments like .NET or Java.

Exam trap

Cisco often tests the misconception that infrastructure changes (like restarting or scaling) can fix code-level bugs, tempting candidates to choose operational fixes instead of debugging the actual null reference in the application logic.

How to eliminate wrong answers

Option A is wrong because rolling back the deployment reverts to a previous version but does not fix the underlying null reference issue; the error will reappear if the same code path is executed. Option B is wrong because restarting the service only clears transient state and does not address the root cause of a null object reference in the code. Option C is wrong because increasing memory for the service does not resolve a null reference; memory issues typically cause OutOfMemoryException or performance degradation, not NullReferenceException.

Practice this question →

51

MCQmedium

A company is running a microservices application on Google Kubernetes Engine (GKE). They have implemented Cloud Monitoring and Cloud Logging, but recently they noticed that the Istio-proxy sidecar logs are missing from Cloud Logging. The application pods are running correctly and the sidecar containers are present. What is the most likely cause of the missing logs?

A.The Istio-proxy logs are being sent to Stackdriver but are filtered by a log sink exclusion.

B.The cluster was not created with the Istio on GKE add-on enabled, so proxy logs are not automatically collected.

C.The Cloud Logging agent is not installed on the cluster nodes.

D.The sidecar container is not configured to output logs to stdout/stderr.

AnswerB

Istio on GKE add-on enables automatic log collection for sidecar proxies.

Why this answer

Option B is correct because when using Istio on GKE, the Istio-proxy sidecar logs are automatically collected and sent to Cloud Logging only if the cluster was created with the 'Istio on GKE' add-on enabled. Without this add-on, the sidecar logs are not automatically forwarded, even though the sidecar containers are present and the application pods are running correctly. The add-on configures the necessary logging pipeline for Istio telemetry and logs.

Exam trap

The trap here is that candidates assume all container logs, including sidecar logs, are automatically collected by GKE's default logging, but Cisco tests the specific requirement that Istio-proxy logs require the 'Istio on GKE' add-on to be enabled for automatic forwarding to Cloud Logging.

How to eliminate wrong answers

Option A is wrong because a log sink exclusion would apply to all logs matching a filter, but the question states the logs are missing entirely, not that they are filtered out after being collected; also, Istio-proxy logs are not automatically sent to Cloud Logging without the add-on, so an exclusion is not the root cause. Option C is wrong because Cloud Logging on GKE uses the built-in Stackdriver Kubernetes Engine Monitoring integration, not a separate Cloud Logging agent installed on nodes; the agent is not required for GKE clusters. Option D is wrong because Istio-proxy sidecar containers are designed to output logs to stdout/stderr by default, and the question confirms the sidecar containers are present and running correctly, so this is not the issue.

Practice this question →

52

Drag & Dropmedium

Drag and drop the steps to configure a Cloud Storage bucket with uniform bucket-level access in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Uniform bucket-level access is configured during bucket creation by selecting the appropriate access control settings.

Practice this question →

53

Matchingmedium

Match each Cloud Logging and Monitoring concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Counts log entries matching a filter

Conditions and notifications for metrics

Target level of reliability for a service

Aggregates and analyzes application errors

Distributed tracing for latency analysis

Why these pairings

These tools help monitor and troubleshoot applications on Google Cloud.

Practice this question →

54

MCQhard

An application running on GKE uses a custom metric to track order processing time. The metric is exported via Prometheus and ingested by Cloud Monitoring using the Managed Service for Prometheus. The team wants to create an alert when the 95th percentile latency exceeds 2 seconds over a 5-minute window. Which PromQL query should be used?

A.avg(rate(order_processing_duration_seconds_sum[5m])) / avg(rate(order_processing_duration_seconds_count[5m]))

B.histogram_quantile(0.95, sum(rate(order_processing_duration_seconds_bucket[5m])))

C.histogram_quantile(0.95, order_processing_duration_seconds_bucket)

D.histogram_quantile(0.95, rate(order_processing_duration_seconds_bucket[5m]))

AnswerD

Correct function to compute percentile from histogram.

Why this answer

Option D is correct because `histogram_quantile(0.95, rate(order_processing_duration_seconds_bucket[5m]))` computes the 95th percentile latency over a 5-minute window using Prometheus histogram buckets. The `rate()` function calculates the per-second increase of each bucket, which is required for accurate quantile estimation from cumulative histograms, and the result directly gives the latency threshold below which 95% of requests fall.

Exam trap

Google Cloud often tests the requirement to use `rate()` with `histogram_quantile` for time-windowed percentile calculations, and the trap here is that candidates mistakenly omit `rate()` (option C) or incorrectly aggregate with `sum()` before quantile (option B), thinking they need to combine all series first.

How to eliminate wrong answers

Option A is wrong because it computes the average latency (mean) using `avg(rate(...sum))/avg(rate(...count))`, not the 95th percentile, and the division of two separate averages is not a valid PromQL pattern for histograms. Option B is wrong because it applies `sum()` to the rate of buckets, which aggregates across all label dimensions (e.g., all pods) before quantile calculation, losing the per-instance distribution and producing an incorrect overall quantile. Option C is wrong because it uses raw bucket counts without `rate()`, ignoring the time window and the per-second normalization required for a 5-minute window; this would compute the quantile over the entire cumulative count, not the recent 5-minute rate.

Practice this question →

55

MCQmedium

A company runs a Java microservice on GKE that processes financial transactions. The service is critical and must meet a 99.9% availability SLO. They have set up Cloud Monitoring alerting policies based on request latency and error rate. Recently, the team noticed that the alerting policy for high latency fires too frequently with false positives, causing alert fatigue. They want to reduce false positives without compromising real issues. The latency metric is collected from the application's custom metric via Prometheus. Which approach should they take?

A.Change the metric to use median instead of average.

B.Increase the alert threshold to a higher latency value.

C.Disable the alert and rely on manual checks.

D.Increase the alert duration to require sustained latency over a longer period.

AnswerD

Longer duration ensures alerts fire only for persistent latency issues, reducing false positives.

Why this answer

Option D is correct because increasing the alert duration requires the high latency to be sustained over a longer period, which filters out transient spikes that cause false positives. This approach preserves the ability to detect genuine, prolonged performance degradation that could impact the 99.9% availability SLO, without raising the threshold and risking missed real issues.

Exam trap

The trap here is that candidates often confuse reducing false positives with simply raising thresholds or changing aggregation methods, when the correct approach is to adjust the alert duration to filter transient noise while maintaining sensitivity to sustained issues.

How to eliminate wrong answers

Option A is wrong because using median instead of average does not address the root cause of false positives from transient spikes; median can still be affected by sustained high latency and may mask the severity of outliers. Option B is wrong because increasing the alert threshold to a higher latency value reduces sensitivity and may cause the team to miss real performance degradation that violates the SLO. Option C is wrong because disabling the alert eliminates automated detection entirely, which is unacceptable for a critical service with a 99.9% availability SLO and would rely on fallible manual checks.

Practice this question →

56

MCQeasy

A developer notices that a Cloud Function is timing out after 60 seconds. The function makes an external API call that occasionally takes longer than the timeout. What is the best practice to handle this?

A.Implement retry logic without changing the timeout

B.Increase the timeout for all Cloud Functions in the project

C.Increase the timeout for the specific Cloud Function to a higher value

D.Decrease the timeout to fail fast and implement retry logic

AnswerC

Adjusting the timeout for the specific function allows the external call to complete.

Why this answer

Option C is correct because Cloud Functions have a configurable timeout per function (up to 540 seconds for HTTP functions). Increasing the timeout for the specific function that makes the slow external API call directly addresses the timeout issue without affecting other functions or introducing unnecessary retry overhead. This is the most targeted and efficient solution.

Exam trap

Google Cloud often tests the misconception that retry logic alone can solve timeout issues, but the trap here is that retries do not extend the execution window—the function must complete within the configured timeout for any single invocation to succeed.

How to eliminate wrong answers

Option A is wrong because retry logic does not prevent the function from timing out; if the function times out after 60 seconds, retries will also fail unless the timeout is increased. Option B is wrong because increasing the timeout for all Cloud Functions in the project is unnecessarily broad and could mask performance issues in other functions, violating the principle of least privilege and granular configuration. Option D is wrong because decreasing the timeout to fail fast would cause the function to fail even more frequently, and implementing retry logic would not help if the external API call inherently takes longer than the reduced timeout.

Practice this question →

57

MCQeasy

A company wants to monitor the CPU utilization of their Compute Engine instances and automatically trigger scaling actions if utilization exceeds 80% for 5 minutes. Which service should they use?

A.Managed instance group autoscaler

B.Cloud Monitoring

C.Cloud Scheduler

D.Cloud Load Balancing

AnswerA

Autoscaler uses monitoring metrics to trigger scaling actions.

Why this answer

Managed instance group (MIG) autoscaler is the correct service because it is designed to automatically adjust the number of Compute Engine instances based on configured utilization metrics. By setting a target CPU utilization of 80% over a 5-minute window, the autoscaler will add or remove instances to maintain that threshold, directly meeting the requirement for automatic scaling actions.

Exam trap

Cisco often tests the distinction between monitoring services (Cloud Monitoring) and action-oriented services (autoscaler), leading candidates to pick Cloud Monitoring because they confuse alerting with automatic scaling.

How to eliminate wrong answers

Option B is wrong because Cloud Monitoring is a monitoring and alerting service that collects metrics, logs, and events, but it does not perform automatic scaling actions; it can trigger alerts but not directly add or remove instances. Option C is wrong because Cloud Scheduler is a cron job service for scheduling tasks at specified times, not for reacting to real-time CPU utilization thresholds. Option D is wrong because Cloud Load Balancing distributes traffic across instances but does not monitor CPU utilization or trigger scaling actions; it works in conjunction with autoscalers but does not perform scaling itself.

Practice this question →

58

MCQmedium

A developer deploying a new version of a microservice sees a sudden increase in error logs in Cloud Logging. The errors are 500 responses from the service. What is the most efficient way to investigate the root cause?

A.Use Cloud Trace to view the trace of failed requests

B.Revert to the previous version immediately

C.Check the CPU and memory metrics in Cloud Monitoring

D.Analyze the error logs using Log Analytics and create a log-based metric

AnswerA

Cloud Trace records traces for each request, including errors, allowing you to see the exact step that failed.

Why this answer

Cloud Trace provides end-to-end latency data and can capture detailed spans for individual requests, including those that resulted in 500 errors. By filtering traces to failed requests, you can pinpoint the exact service or function call that caused the error, making it the most efficient root-cause investigation method without requiring code changes or additional instrumentation.

Exam trap

Cisco often tests the misconception that log analysis alone is sufficient for debugging distributed systems, but the trap here is that Cloud Trace provides request-scoped context that logs lack, making it the most efficient first step for 500 errors in a microservice deployment.

How to eliminate wrong answers

Option B is wrong because reverting immediately is a reactive rollback that does not identify the root cause; it may resolve symptoms but wastes time if the issue is not version-related. Option C is wrong because CPU and memory metrics show resource utilization but cannot reveal application-level logic errors, such as a null pointer exception or a failed database query, that cause 500 responses. Option D is wrong because analyzing error logs and creating a log-based metric is useful for monitoring trends but is less efficient for pinpointing the specific failing request path; Cloud Trace directly correlates traces with error status codes for faster diagnosis.

Practice this question →

59

Multi-Selecteasy

Which TWO of the following are valid ways to export Cloud Logging logs to BigQuery?

Select 2 answers

A.Use the Logging API to write logs directly to BigQuery

B.Use a Dataflow pipeline to stream logs from Pub/Sub to BigQuery

C.Create a log sink with destination set to BigQuery dataset

D.Use the BigQuery Data Transfer Service for Cloud Logging

E.Use Cloud Monitoring to send logs to BigQuery

AnswersB, C

This is a valid alternative path for exporting logs to BigQuery.

Why this answer

Option B is correct because you can use a Dataflow pipeline to read Cloud Logging logs from a Pub/Sub topic (where logs are routed via a log sink) and stream them into BigQuery for real-time analysis. This is a common pattern for custom log processing and transformation before loading into BigQuery. Option C is correct because Cloud Logging allows you to create a log sink directly with a destination of a BigQuery dataset, which automatically exports logs in near real-time without additional infrastructure.

Exam trap

Cisco often tests the distinction between direct sink destinations (BigQuery, Pub/Sub, Cloud Storage) and indirect methods like Dataflow or custom code, leading candidates to mistakenly think the Logging API or BigQuery Data Transfer Service can be used for export.

Practice this question →

60

MCQhard

You are a site reliability engineer for a fintech company that runs a latency-sensitive trading application on Google Kubernetes Engine (GKE). The application is instrumented with OpenTelemetry and exports traces and metrics to Cloud Monitoring and Cloud Logging. Recently, the team observed a gradual increase in p99 latency from 50ms to 500ms over the past week, and error rates have spiked to 5% from a baseline of 0.1%. You review the Cloud Monitoring dashboards and notice that the 'container/cpu/utilization' metric shows normal usage, but the 'container/memory/bytes_used' metric shows a steady climb, reaching 90% of the memory limit on several pods. The application logs contain many 'OutOfMemoryError' exceptions and 'GC overhead limit exceeded' messages. You also see that the HPA (Horizontal Pod Autoscaler) has not triggered any scale-up events because the 'custom/googleapis.com|container/cpu/utilization' metric is below the target utilization threshold. The cluster autoscaler is enabled and has sufficient node pool capacity. What is the most likely root cause and the best immediate action to resolve the issue?

A.Enable the Vertical Pod Autoscaler (VPA) in update mode to automatically adjust memory requests.

B.Switch the HPA to use the default 'container/cpu/utilization' metric instead of the custom metric.

C.Increase the memory request and limit for the pods to allow more memory usage.

D.Add a custom metric for memory utilization to the HPA and configure the target to scale when memory exceeds 70%.

AnswerD

This allows the HPA to react to memory pressure, scaling out pods to distribute memory load and reduce OOM errors.

Why this answer

The gradual memory increase and OutOfMemoryError exceptions indicate that the application is memory-bound, not CPU-bound. Since the HPA is configured to scale only on CPU utilization, it never triggers scale-up despite memory pressure. Adding a custom memory utilization metric to the HPA (option D) directly addresses the root cause by scaling pods when memory exceeds 70%, preventing OOM errors and reducing latency.

Exam trap

Cisco often tests the misconception that CPU is the only metric for HPA scaling, or that increasing resource limits alone solves memory pressure, when in fact memory-bound applications require scaling based on memory utilization to avoid OOM and latency degradation.

How to eliminate wrong answers

Option A is wrong because the Vertical Pod Autoscaler (VPA) adjusts resource requests/limits but does not scale the number of pods; it also cannot be used with HPA on the same metric, and update mode may cause pod restarts. Option B is wrong because switching to the default CPU metric would not help; CPU utilization is already normal, so the HPA would still not scale. Option C is wrong because simply increasing memory requests/limits without scaling out does not resolve the underlying issue of insufficient total memory capacity; pods will still hit the new limit eventually, and it does not address the latency spike caused by GC overhead.

Practice this question →

61

Multi-Selecthard

A company is using Cloud Monitoring to set up an SLO for a latency-sensitive API. They have defined a custom SLI: the proportion of requests with latency under 200ms. Which three components must they define to create a complete SLO configuration? (Choose three.)

Select 3 answers

A.A target (e.g., 99.9%)

B.An SLI definition with a good/bad time series

C.A burn rate alert policy

D.A metric threshold alert

E.A window of compliance (e.g., 30 days)

AnswersA, B, E

Correct: the desired success rate.

Why this answer

Option A is correct because a target (e.g., 99.9%) defines the desired proportion of good events over a compliance window, which is essential for an SLO. In Cloud Monitoring, the target is the threshold against which the SLI is measured to determine if the SLO is met.

Exam trap

Cisco often tests that candidates confuse optional alerting policies (burn rate alerts, metric threshold alerts) with the mandatory components of an SLO configuration, which are strictly the SLI, target, and compliance window.

Practice this question →

62

MCQmedium

A team wants to monitor custom application metrics from a Compute Engine instance. They use the Cloud Monitoring agent. Which metric type should they use to report a gauge measurement like current memory usage?

A.histogram

B.delta

C.cumulative

D.gauge

AnswerD

Gauge metric type reports instantaneous values.

Why this answer

Option D is correct because a gauge metric type is specifically designed to report a value that can arbitrarily increase or decrease over time, such as current memory usage. The Cloud Monitoring agent supports gauge metrics for point-in-time measurements, and they are reported as a single data point without any aggregation window, making them ideal for snapshot-like observations.

Exam trap

Google Cloud often tests the distinction between metric types by presenting a scenario where a value can go up or down, and candidates mistakenly choose cumulative because they associate it with 'total usage' over time, forgetting that cumulative metrics must be monotonically increasing.

How to eliminate wrong answers

Option A is wrong because histogram metrics are used to capture the distribution of values over a time window (e.g., request latency percentiles), not a single instantaneous value like current memory usage. Option B is wrong because delta metrics represent the change in a value between two time points (e.g., requests per second), but current memory usage is not a rate or difference; it is an absolute snapshot. Option C is wrong because cumulative metrics monotonically increase over time (e.g., total bytes sent), and memory usage can decrease, which violates the monotonic property required for cumulative metrics.

Practice this question →

63

MCQmedium

You are configuring a Cloud Monitoring alerting policy for a Cloud Run service. The service has a maximum of 10 concurrent requests per instance. You want to be alerted when the average number of concurrent requests per instance exceeds 8 for at least 1 minute. Which metric and condition type should you use?

A.Metric: run.googleapis.com/request_count, Condition type: Metric Threshold, Threshold: >8

B.Metric: run.googleapis.com/request_count, Condition type: Metric Absence, Duration: 1 min

C.Metric: resource/container/cpu/utilization, Condition type: Metric Threshold, Threshold: >80%

D.Metric: run.googleapis.com/request_count, Condition type: Change Rate, Threshold: >0.5

AnswerA

This metric measures active requests; threshold condition works for sustained high concurrency.

Why this answer

Option A is correct because the `run.googleapis.com/request_count` metric tracks the number of concurrent requests per instance, which directly matches the requirement. A Metric Threshold condition with a threshold of >8 triggers an alert when the average exceeds 8 for at least 1 minute, aligning with the specified criteria.

Exam trap

Cisco often tests the distinction between metric types and condition types, where candidates confuse Metric Threshold (for sustained high values) with Change Rate (for sudden spikes) or Metric Absence (for missing data), leading to incorrect selections.

How to eliminate wrong answers

Option B is wrong because Metric Absence triggers when data is missing, not when a value exceeds a threshold; it would alert if the metric stops reporting, not when concurrent requests are high. Option C is wrong because `resource/container/cpu/utilization` measures CPU usage, not concurrent requests, and the threshold of 80% is unrelated to the request count. Option D is wrong because Change Rate detects sudden increases or decreases in the metric value, not a sustained high level; it would alert on a spike of >0.5 requests per minute, not when the average exceeds 8.

Practice this question →

64

MCQeasy

A developer needs to view detailed performance profiles of a Java application running on Compute Engine to identify CPU hotspots. Which Google Cloud service should they use?

A.Cloud Monitoring

B.Cloud Trace

C.Cloud Profiler

D.Cloud Logging

AnswerC

Correct: Cloud Profiler is designed to capture and analyze performance profiles.

Why this answer

Cloud Profiler is the correct service because it provides continuous, low-overhead CPU and heap profiling for Java applications running on Compute Engine. It uses statistical sampling to identify which methods consume the most CPU time, enabling developers to pinpoint hotspots without requiring code changes or redeployment.

Exam trap

The trap here is that candidates confuse Cloud Trace (distributed tracing for latency) with Cloud Profiler (CPU/memory profiling), because both deal with 'performance' but at different granularities—Trace shows request paths, while Profiler shows method-level CPU consumption.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring collects metrics, uptime checks, and alerting policies but does not provide method-level CPU profiling or flame graphs to identify code hotspots. Option B is wrong because Cloud Trace focuses on latency analysis of request paths and distributed tracing, not on CPU usage per method or function. Option D is wrong because Cloud Logging aggregates and stores log entries for debugging and auditing, but it lacks the profiling agent and sampling engine needed to capture CPU call stacks.

Practice this question →

65

MCQhard

A company uses Cloud Logging to centralize logs from multiple projects. They want to create a log-based metric for tracking 404 errors. However, the metric shows zero data even though 404 errors are occurring. What is the most likely reason?

A.The metric filter uses the wrong resource type

B.The metric is sampled and not all logs are considered

C.The logs are not being exported to Cloud Logging

D.The logs are being excluded by an exclusion filter before the metric is applied

AnswerD

Exclusion filters remove logs before metric ingestion.

Why this answer

Option D is correct because log-based metrics are computed from logs that have passed through all exclusion filters. If an exclusion filter is configured to discard logs matching certain criteria (e.g., all HTTP 4xx responses), those logs are never evaluated by the metric filter, causing the metric to show zero data even though 404 errors are occurring. Exclusion filters are applied before log-based metric evaluation in Cloud Logging.

Exam trap

Cisco often tests the order of operations in the Cloud Logging pipeline, specifically that exclusion filters are applied before log-based metrics, leading candidates to mistakenly blame export or sampling issues instead.

How to eliminate wrong answers

Option A is wrong because the resource type in the metric filter only affects how logs are grouped or labeled, not whether they are included in the metric; a mismatched resource type would not cause zero data if the logs themselves are present. Option B is wrong because Cloud Logging does not sample logs for log-based metrics; all ingested logs are evaluated against metric filters unless excluded. Option C is wrong because the question states the company uses Cloud Logging to centralize logs, meaning logs are already being sent to Cloud Logging; the issue is not about export but about processing within the logging pipeline.

Practice this question →

66

MCQmedium

A Cloud Run service is experiencing intermittent high latency. The team has enabled Cloud Trace. They want to identify the root cause by analyzing traces. What should they look for in the Trace viewer?

A.High container CPU usage

B.Large number of concurrent requests

C.Frequent log entries with 'WARNING'

D.Spans with high latency and error status

AnswerD

High-latency spans pinpoint bottlenecks; errors indicate failures.

Why this answer

In Cloud Trace, the root cause of intermittent high latency is identified by examining spans—the fundamental units representing work in a distributed system. Spans with high latency directly indicate where time is being spent, and an error status (e.g., HTTP 5xx or gRPC error codes) pinpoints a failure that could be causing retries or blocking, leading to the observed latency. This combination is the most direct signal for root cause analysis in trace data.

Exam trap

Cisco often tests the distinction between metrics (like CPU usage) and trace data (like spans), leading candidates to confuse operational monitoring signals with the specific diagnostic tools available in Cloud Trace.

How to eliminate wrong answers

Option A is wrong because container CPU usage is a metric, not a trace attribute; Cloud Trace analyzes request-level spans, not resource utilization, which is monitored via Cloud Monitoring. Option B is wrong because a large number of concurrent requests is a symptom or contributing factor, not a root cause identifiable from a single trace; traces show individual request paths, not aggregate concurrency. Option C is wrong because frequent log entries with 'WARNING' are log-based signals, not trace data; Cloud Trace focuses on span timing and status, and warnings may correlate with but do not directly indicate the root cause of latency in a trace.

Practice this question →

67

Multi-Selecthard

Which THREE are valid methods to create custom metrics in Cloud Monitoring?

Select 3 answers

A.Using the Cloud Monitoring API to write metric points.

B.Using the OpenTelemetry Collector to export metrics.

C.Using Cloud Console's Metrics Explorer to manually enter data.

D.Creating a log-based metric from Cloud Logging.

E.Using Cloud Functions to emit metrics via Stackdriver Monitoring API.

AnswersA, B, D

The monitoring API allows programmatic ingestion of custom metric data points, a standard approach for custom metrics.

Why this answer

Option A is correct because the Cloud Monitoring API allows you to write custom metric points directly using the `projects.timeSeries.create` method. This enables you to define your own metric descriptors and send time-series data to Cloud Monitoring, which is a fundamental way to create custom metrics.

Exam trap

The trap here is that candidates may think Metrics Explorer (option C) can create custom metrics because it allows you to chart data, but it is purely a query interface and cannot ingest new data.

Practice this question →

68

MCQhard

A company has a multi-region deployment of their application on GKE. They need to monitor service-level indicators (SLIs) like availability and latency across regions. They want a single pane of glass to view SLO compliance. What should they use?

A.Cloud Logging with log-based metrics

B.Cloud Profiler cross-region profiles

C.Cloud Monitoring SLO monitoring

D.Cloud Trace multi-region traces

AnswerC

SLO monitoring is specifically designed for tracking compliance with service-level objectives.

Why this answer

Cloud Monitoring SLO monitoring is the correct choice because it provides a unified dashboard (single pane of glass) to define, track, and visualize service-level indicators (SLIs) such as availability and latency across multiple GKE regions. It allows you to set SLO targets, monitor compliance over time, and receive alerts when the error budget is depleted, all within a single monitoring view.

Exam trap

Cisco often tests the distinction between monitoring tools (Cloud Monitoring) and debugging tools (Cloud Trace, Cloud Profiler), so the trap here is that candidates may confuse Cloud Trace's latency traces with the ability to monitor SLO compliance, or think Cloud Logging's log-based metrics can replace the dedicated SLO dashboard.

How to eliminate wrong answers

Option A is wrong because Cloud Logging with log-based metrics is used to extract metrics from log entries (e.g., count of errors), but it does not natively provide SLO compliance dashboards or a cross-region aggregated view of SLIs; it lacks the built-in SLO tracking and error budget management. Option B is wrong because Cloud Profiler is a continuous profiling tool that identifies performance bottlenecks (CPU, memory) in code, not a monitoring tool for SLIs like availability or latency across regions; it does not offer SLO compliance dashboards. Option D is wrong because Cloud Trace is a distributed tracing system that captures latency data for individual requests, but it does not aggregate SLIs or provide SLO compliance views; it focuses on request-level traces, not high-level SLO dashboards.

Practice this question →

69

MCQmedium

A developer is using Cloud Logging and wants to export logs from a specific project to BigQuery for long-term analysis. They have created a log sink and given the appropriate permissions, but logs are not appearing in BigQuery. What is the most likely cause?

A.The sink's filter is too restrictive and no logs match.

B.The sink's destination BigQuery dataset is in a different region than the logs.

C.The log entries are not in JSON format.

D.The service account used for the sink does not have the 'bigquery.dataEditor' role.

AnswerD

Correct: the sink's writer identity must have write access to the BigQuery dataset.

Why this answer

Option D is correct because the log sink uses a service account to write logs to BigQuery. Even if the sink is configured correctly, the service account must have the 'bigquery.dataEditor' role on the destination dataset to insert log entries. Without this role, the sink will fail silently, and logs will not appear in BigQuery.

Exam trap

Cisco often tests the misconception that simply creating a sink and granting project-level permissions is sufficient, when in fact the service account needs explicit dataset-level 'bigquery.dataEditor' role.

How to eliminate wrong answers

Option A is wrong because if the filter were too restrictive, no logs would match, but the question states logs are not appearing, not that no logs are generated; a restrictive filter would still show matching logs if any exist. Option B is wrong because BigQuery datasets can receive logs from any region; cross-region log exports are supported, though they may incur additional costs, but they do not prevent logs from appearing. Option C is wrong because Cloud Logging automatically converts log entries to JSON format when exporting to BigQuery; the original log format does not affect the export.

Practice this question →

70

MCQeasy

A company uses Cloud Monitoring to set up an alerting policy for CPU utilization on Compute Engine instances. They want to be notified when average CPU usage exceeds 80% for 5 minutes. Which threshold type should they use?

A.Forecast

B.Change rate

C.Threshold

D.Metric absence

AnswerC

Threshold alert fires when metric crosses a set value for a duration.

Why this answer

Option C is correct because a Threshold alerting policy in Cloud Monitoring triggers when a metric's value crosses a defined static boundary. For this use case, setting a threshold of 80% with a duration of 5 minutes directly matches the requirement to alert when average CPU usage exceeds 80% for that period.

Exam trap

Google Cloud often tests the distinction between alerting on a sustained level (Threshold) versus alerting on a change (Change rate) or a prediction (Forecast), and candidates confuse 'average over time' with 'rate of change'.

How to eliminate wrong answers

Option A is wrong because Forecast alerting uses machine learning to predict future metric values and alert when the forecast crosses a threshold, not for monitoring current or historical average CPU usage. Option B is wrong because Change rate alerting detects sudden increases or decreases in a metric's value over a window, not a sustained level above a fixed percentage. Option D is wrong because Metric absence alerts fire when a metric stops reporting data, which is unrelated to monitoring CPU usage exceeding a threshold.

Practice this question →

71

Multi-Selectmedium

Which THREE metrics are commonly used to create a Service Level Indicator (SLI) for availability of an HTTP-based service?

Select 3 answers

A.Uptime check success rate

B.CPU utilization

C.Error rate (5xx responses)

D.Request latency

E.Request count

AnswersA, C, D

Uptime checks measure whether the service is reachable and responding.

Why this answer

Uptime check success rate directly measures whether the service is reachable and responding, typically via periodic HTTP probes (e.g., GET /health). A successful response (e.g., HTTP 200) indicates availability, while failures (timeouts, connection errors) indicate unavailability. This is a standard SLI for availability in HTTP-based services.

Exam trap

Cisco often tests the distinction between availability SLIs (uptime, error rate) and performance SLIs (latency, throughput), so candidates mistakenly include CPU utilization or request count as availability metrics.

Practice this question →

72

Multi-Selectmedium

Which TWO are best practices for setting up Cloud Monitoring alerting policies to minimize alert fatigue? (Select exactly 2.)

Select 2 answers

A.Aggregate metrics across all projects before alerting.

B.Use condition thresholds with an 'AND' combination of multiple metrics.

C.Use log-based metrics for all alerts instead of metric-based alerts.

D.Create a separate alerting policy for each possible symptom.

E.Set the 'for' parameter to a duration longer than typical transient spikes.

AnswersB, E

Requires both conditions to be true reduces noise.

Why this answer

Using multiple conditions with AND logic reduces false positives. Setting 'for' duration prevents transient spikes from alerting. Aggregating across projects first is not best practice; it's better to alert per project.

Using log-based metrics for everything is not always appropriate. Synthetic monitors are for availability, not general alerting.

Practice this question →

73

MCQeasy

You want to identify performance bottlenecks in your application's code, such as functions consuming excessive CPU. Which Google Cloud tool should you use?

A.Cloud Profiler

B.Cloud Monitoring

C.Cloud Trace

D.Cloud Logging

AnswerA

Cloud Profiler is designed to identify CPU and heap usage at the function level, pinpointing bottlenecks.

Why this answer

Cloud Profiler is the correct tool because it continuously gathers CPU and memory usage data from your application's functions and methods, presenting a flame graph or call graph that pinpoints which code paths consume the most resources. This allows you to identify performance bottlenecks like functions consuming excessive CPU without adding significant overhead to your production environment.

Exam trap

The trap here is that candidates confuse Cloud Monitoring's infrastructure-level CPU metrics with Cloud Profiler's application-level function profiling, leading them to choose Cloud Monitoring because it sounds like it monitors CPU usage.

How to eliminate wrong answers

Option B (Cloud Monitoring) is wrong because it provides metrics, dashboards, and alerts for infrastructure-level resources (e.g., CPU utilization of a VM or request latency), but it does not profile individual functions or code lines to identify CPU-intensive methods. Option C (Cloud Trace) is wrong because it focuses on latency analysis of request paths across distributed systems, showing how long each service or RPC call takes, not CPU consumption per function. Option D (Cloud Logging) is wrong because it collects and stores log entries from applications and services, enabling search and analysis of textual events, but it does not perform statistical sampling of CPU usage at the function level.

Practice this question →

74

MCQeasy

A developer wants to ensure that error logs from their Java application are automatically captured and grouped in Cloud Error Reporting. What is the recommended approach?

A.Configure a log sink to Error Reporting

B.Export logs to BigQuery and then import to Error Reporting

C.Instrument the application with the Error Reporting client library

D.Use a custom log-based metric to count errors

AnswerC

The client library automatically captures and groups errors.

Why this answer

Option C is correct because the Error Reporting client library directly integrates with the application to automatically capture and group error logs, sending them to Cloud Error Reporting without requiring additional infrastructure. This is the recommended approach as it provides structured error reporting with automatic grouping, stack trace analysis, and real-time notifications.

Exam trap

Cisco often tests the misconception that log sinks can route directly to Error Reporting, but in reality, log sinks only support specific destinations like BigQuery, Pub/Sub, Cloud Storage, and Logging buckets, not Error Reporting.

How to eliminate wrong answers

Option A is wrong because configuring a log sink to Error Reporting is not a supported operation; log sinks route logs to destinations like BigQuery, Pub/Sub, or Cloud Storage, not directly to Error Reporting. Option B is wrong because exporting logs to BigQuery and then importing to Error Reporting introduces unnecessary complexity and latency, and Error Reporting does not have an import mechanism from BigQuery. Option D is wrong because a custom log-based metric to count errors only tracks the count of errors, not the actual error details, stack traces, or grouping required for Error Reporting.

Practice this question →

75

Multi-Selecthard

A company's application on GKE is experiencing performance degradation. They want to use Google Cloud operations tools to identify the root cause. Which THREE tools should they use in combination?

Select 3 answers

A.Cloud Trace

B.Cloud Monitoring

C.Cloud Profiler

D.Cloud Debugger

E.Cloud Logging

AnswersA, B, E

Cloud Trace enables distributed tracing to identify latency bottlenecks.

Why this answer

Cloud Trace is correct because it provides distributed tracing capabilities that allow you to analyze latency across microservices in a GKE application. By collecting trace data from each request as it propagates through services, Cloud Trace helps identify performance bottlenecks, such as slow downstream calls or inefficient database queries, which are common causes of performance degradation.

Exam trap

Cisco often tests the distinction between tools that diagnose performance (Trace, Monitoring, Logging) versus tools that debug code (Debugger) or profile resource usage (Profiler), leading candidates to include Profiler or Debugger when only performance monitoring tools are needed.

Practice this question →

Page 1 of 2 · 111 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Managing application performance monitoring questions.

Start 20-question session