ACEChapter 91 of 101Objective 4.1

Cloud Debugger and Error Reporting

This chapter covers Google Cloud's Debugger and Error Reporting services, two critical tools for diagnosing and resolving application issues in production. For the ACE exam, these services fall under Objective 4.1, 'Troubleshooting and Monitoring,' and typically appear in 3-5% of questions. You'll need to understand their purpose, how to enable them, their limitations (e.g., unsupported languages for Debugger), and how they integrate with other operations tools like Cloud Logging and Cloud Monitoring. This chapter provides the depth required to answer scenario-based questions confidently.

25 min read
Intermediate
Updated May 31, 2026

Debugging with a Time-Traveling Detective

Imagine a detective investigating a crime that happened in a busy office. The detective cannot be present at the time of the crime, but the office has a special security system that records every single action taken by every person, including their thoughts (variable values) and decisions (code paths). Moreover, the system allows the detective to rewind and replay any moment of the day, observing exactly what each person saw and did. This is Cloud Debugger. In contrast, traditional debugging is like the detective only having a few still photographs taken at specific times (log statements) – if something went wrong between photos, it's guesswork. Cloud Debugger captures the full state of your application at any point in time, even in production, without stopping the application or affecting other users. It uses a snapshot mechanism that takes a 'snapshot' of the code's execution state – including variables, call stack, and local variables – at a specific line of code, without pausing the program. This is akin to the detective being able to pause time, examine everything, and then let time resume, all without anyone noticing. Error Reporting, on the other hand, is like a centralized tip line that automatically collects and categorizes all reported incidents (errors) from the office, grouping similar reports together, highlighting the most frequent ones, and even suggesting the most likely culprit (the code line that caused the error). Together, they form a powerful pair: one for deep, real-time investigation of specific issues, and one for monitoring overall error trends and prioritizing fixes.

How It Actually Works

What is Cloud Debugger?

Cloud Debugger is a production debugging service that lets you inspect the state of a running application without stopping it or adding log statements. It is part of Google Cloud's operations suite (formerly Stackdriver). The key value proposition is that you can capture a snapshot of your application's state – including the call stack, local variables, and heap references – at a specific line of code, while the application continues to serve requests normally. This is fundamentally different from traditional debuggers that require pausing the application (which is impossible in production) or adding temporary log statements (which require redeployment and may affect performance).

How Cloud Debugger Works Internally

Cloud Debugger uses a snapshotting mechanism that is implemented differently for each supported language. The general flow is:

1.

Snapshot Creation: You define a snapshot condition – typically a specific source file and line number, plus an optional expression to evaluate (e.g., x > 5). This is done via the Cloud Console, the gcloud command-line tool, or the Cloud Debugger API.

2.

Agent Injection: The Cloud Debugger agent runs alongside your application (it is included in many Google Cloud runtimes like App Engine, Compute Engine, and GKE by default). The agent is a library that instruments the application bytecode (Java, Python, Go) or uses native hooks (Node.js, Ruby). It monitors for snapshot requests from the Cloud Debugger backend.

3.

Condition Evaluation: When a request triggers the line of code where the snapshot is set, the agent evaluates the condition. If the condition is true (or if no condition is set), the agent captures the snapshot. The snapshot includes:

- The call stack (method/function names and line numbers) - Local variables (values at the time of snapshot) - The current expression values (if specified) - The object references and their fields (limited depth)

4.

Asynchronous Capture: The agent does NOT pause the application thread. Instead, it asynchronously copies the relevant state while the thread continues execution. This ensures minimal latency impact (typically <5ms per snapshot). However, the captured data is a point-in-time copy; the thread may change variables after the copy starts.

5.

Upload and Display: The agent sends the snapshot data to the Cloud Debugger backend, which stores it and makes it available in the Cloud Console. You can view the snapshot details, including the call stack and variable values, as if you had paused the debugger.

Supported Languages and Limitations

As of the ACE exam, Cloud Debugger supports:

Java (Java 8, 11, 17)

Python (2.7, 3.x)

Go

Node.js (10+)

Ruby (2.5+)

PHP (7.3+)

Important limitation: Cloud Debugger does NOT support .NET, C++, or any other languages outside this list. This is a frequent exam trap – candidates assume all languages are supported.

Additionally, Cloud Debugger has limitations:

It cannot debug code that is not deployed (e.g., local development).

It cannot debug code that uses native libraries or JNI (Java) that modify memory outside the managed runtime.

Snapshot data is retained for a limited time (default 24 hours, configurable up to 30 days).

There is a limit on the number of active snapshots per project (default 10, can be increased via quota request).

Debugging is only available for applications running on Google Cloud (Compute Engine, GKE, App Engine, Cloud Functions, etc.) or on-premises with the agent configured.

What is Error Reporting?

Error Reporting is a service that aggregates and analyzes errors from your application. It automatically groups similar errors (based on stack trace similarity) and provides a centralized view of error occurrences, including count, first/last seen, and affected users. It integrates with Cloud Logging – errors are automatically extracted from logs if they are in a supported format. Error Reporting also supports manual error reporting via the API or client libraries.

How Error Reporting Works

1.

Error Ingestion: Errors can be reported in three ways:

Automatically via Cloud Logging: If your application logs errors using a supported logging framework (e.g., log package in Go, winston in Node.js, logback in Java) and the logs are sent to Cloud Logging, Error Reporting automatically parses them. The log entry must have a severity of ERROR or higher and contain a stack trace.

Via the Error Reporting API: You can programmatically report errors using the google.devtools.clouderrorreporting.v1beta1.ReportErrorsService API.

Via client libraries: Google provides client libraries for several languages that simplify reporting errors.

2.

Error Grouping: Error Reporting uses a clustering algorithm to group similar errors. It analyzes the stack trace, error message, and other metadata. The grouping is based on the 'fingerprint' of the error. Once grouped, each group is assigned a unique ID. The grouping is not perfect – two different errors that produce similar stack traces might be merged, but this is rare.

3.

Dashboard and Alerts: The Cloud Console provides an Error Reporting dashboard that shows:

- Total error count per group - Number of affected users (if user data is provided) - First and last occurrence timestamps - Resolution status (open, acknowledged, resolved) - A sample stack trace for each group

You can also set up alerts based on error count or rate using Cloud Monitoring (e.g., alert if error count exceeds 100 in 5 minutes).

4.

Retention: Error data is retained for 30 days by default. For longer retention, you must export logs to Cloud Storage or BigQuery.

Configuration and Verification Commands

Enabling Cloud Debugger: - For App Engine: Automatically enabled for standard environment with supported runtimes. - For Compute Engine or GKE: You need to install the Cloud Debugger agent. For example, on a Compute Engine VM running Java:

# Add the agent to your JVM startup options
  -agentpath:/opt/cdbg/cdbg_java_agent.so

Verify the agent is running: Check the application logs for a message like "Cloud Debugger agent initialized".

Using Cloud Debugger with gcloud:

gcloud debug snapshots create --target=<target> --condition=<condition> --expression=<expression>

Example:

gcloud debug snapshots create --target=default --condition="x > 10" --expression="x + y"

List snapshots:

gcloud debug snapshots list --target=default

Enabling Error Reporting: - Ensure your application logs to Cloud Logging with appropriate severity and stack traces. For example, in Python:

import logging
  logging.exception('An error occurred')

- Or use the Error Reporting client library:

from google.cloud import error_reporting
  client = error_reporting.Client()
  client.report_exception()

Verify errors appear in the Cloud Console under Error Reporting.

Interaction with Related Technologies

Cloud Logging: Both services rely on Cloud Logging for data ingestion. Error Reporting automatically extracts errors from logs. Cloud Debugger can be used to add log points (temporary log statements that are injected without redeploying) – these appear in Cloud Logging as log entries with severity INFO.

Cloud Monitoring: You can create alerting policies based on Error Reporting metrics (e.g., logging.googleapis.com/user/error_count). Cloud Debugger does not generate metrics directly, but you can monitor snapshot creation via audit logs.

Cloud Trace: While not directly integrated, you can use Cloud Trace to correlate errors with latency issues.

Common Exam Scenarios

The ACE exam often tests:

Choosing the right tool for the job: Debugger for investigating a specific bug in production without redeploying; Error Reporting for monitoring overall error trends.

Knowing that Debugger does NOT require restarting the application or adding log statements.

Remembering that Debugger supports Java, Python, Go, Node.js, Ruby, PHP – not .NET or C++.

Understanding that Error Reporting groups errors automatically and you can set alerts.

Recognizing that both services are part of the operations suite and have free quotas (e.g., Debugger allows up to 10 active snapshots, Error Reporting has a free tier of 250,000 error events per month).

Walk-Through

1

Enable Cloud Debugger Agent

Ensure your application runtime includes the Cloud Debugger agent. For App Engine standard environment, it's included by default for supported languages. For Compute Engine or GKE, you must install the agent. For Java, add the JVM argument `-agentpath:/opt/cdbg/cdbg_java_agent.so`. For Python, install the package `google-cloud-debugger` and add `import googleclouddebugger; googleclouddebugger.enable()` in your code. Verify the agent starts by checking logs for 'Cloud Debugger agent initialized'.

2

Set a Snapshot via Console or CLI

In the Cloud Console, navigate to Debugger, select your application target (e.g., 'default' for App Engine), and choose the source file and line number. Optionally, add a condition (e.g., `x > 5`) and expressions (e.g., `x + y`). Alternatively, use `gcloud debug snapshots create` with the `--target`, `--condition`, and `--expression` flags. The snapshot request is sent to the agent.

3

Agent Evaluates Snapshot

When a request hits the specified line, the agent evaluates the condition. If true (or no condition), the agent asynchronously captures the call stack, local variables, and expression values. The thread continues without pausing. The capture process is fast (<5ms) and does not block the request. The agent then uploads the snapshot data to the Cloud Debugger backend.

4

View Snapshot in Console

The snapshot appears in the Cloud Debugger Console. You can inspect the call stack (click on frames to see source code), local variables (expand to see values), and any expressions. The snapshot is retained for 24 hours by default. You can also add log points that inject log statements without redeploying – these appear in Cloud Logging.

5

Configure Error Reporting

Ensure your application logs errors with stack traces to Cloud Logging. For automatic ingestion, the log entry must have severity 'ERROR' and contain a stack trace. Alternatively, use the Error Reporting API or client library. In the Cloud Console, navigate to Error Reporting to see grouped errors. You can set up alerts via Cloud Monitoring based on error count.

What This Looks Like on the Job

Scenario 1: Debugging a Production Bug in a Java Backend

A fintech company runs a Java-based transaction processing system on Google Kubernetes Engine (GKE). Users report that some transactions fail with a cryptic error message only under high load. Traditional logging doesn't capture enough context. The team uses Cloud Debugger to set a snapshot at the line where the error is thrown, with a condition transactionAmount > 10000 && error != null. When the condition triggers, they capture the full call stack and local variables, revealing that a currency conversion rate was not initialized for certain currencies. The snapshot data includes the exact values of the missing rate, allowing the team to fix the bug without reproducing the issue in staging. The agent runs alongside the application with minimal overhead (less than 1% CPU increase). The team also sets up a log point to log the conversion rate for future debugging.

Scenario 2: Monitoring Error Rates in a Python Web App

An e-commerce platform uses Python on App Engine. After a new deployment, the error rate spikes. The team uses Error Reporting to see that the errors are grouped into two main categories: 'DivisionByZero' in the discount calculation and 'KeyError' in the inventory lookup. The dashboard shows the error count per group, affected users, and a sample stack trace. The team sets up an alert in Cloud Monitoring to notify them if the error rate exceeds 10 per minute. They also use Cloud Debugger to set a snapshot on the discount calculation line to inspect the values causing division by zero. The fix is deployed within hours. Without Error Reporting, they would have had to manually search through logs.

Common Pitfalls

Misconfiguring the Debugger agent: Forgetting to include the agent in the Docker image for GKE leads to snapshots never being captured. Always verify agent initialization logs.

Exceeding snapshot quota: The default limit of 10 active snapshots can be hit quickly in a team environment. Use gcloud debug snapshots delete to clean up old snapshots.

Ignoring Error Reporting grouping: Sometimes similar errors are grouped incorrectly. You can manually split groups or add custom grouping rules via the API.

Not setting alerts: Error Reporting alone does not notify you; you must configure Cloud Monitoring alerts to be proactive.

How ACE Actually Tests This

ACE Exam Focus: Objective 4.1

The ACE exam tests your ability to choose the appropriate tool for troubleshooting and monitoring. For Cloud Debugger and Error Reporting, the questions are typically scenario-based. Key points:

1. What the exam tests: - Understanding the purpose of Cloud Debugger (snapshots, log points) vs. Error Reporting (aggregation, grouping). - Knowing which languages are supported (Java, Python, Go, Node.js, Ruby, PHP) – .NET is NOT supported. - Recognizing that Debugger does NOT require redeployment or application restart. - Understanding that Error Reporting automatically extracts errors from Cloud Logging. - Knowing the default retention (24 hours for snapshots, 30 days for error data).

2. Common wrong answers: - Choosing Cloud Debugger for monitoring error trends – that's Error Reporting's job. - Assuming Debugger supports all languages – it doesn't. - Thinking Debugger pauses the application – it does not. - Believing Error Reporting requires manual setup to parse logs – it auto-parses if severity is ERROR. - Confusing log points with snapshots – log points inject log statements without capturing state.

3. Specific numbers and terms: - Maximum active snapshots: 10 (default). - Snapshot retention: 24 hours (default). - Error Reporting retention: 30 days. - Supported languages: Java, Python, Go, Node.js, Ruby, PHP. - Command: gcloud debug snapshots create. - API: google.devtools.clouderrorreporting.v1beta1.

4. Edge cases: - Debugger cannot debug code that uses native libraries (e.g., JNI in Java). - Error Reporting requires stack traces in logs; if your log format is custom, you may need to use the API. - Both services have free quotas but may incur charges beyond limits.

5. Elimination strategy: - If the question asks for 'real-time debugging without redeployment' and the language is supported, choose Cloud Debugger. - If the question asks for 'aggregated error view and alerts', choose Error Reporting. - If the language is .NET or C++, eliminate Cloud Debugger as an option. - If the question mentions 'pausing the application', that is incorrect for Cloud Debugger.

Key Takeaways

Cloud Debugger captures snapshots without pausing the application; it is not a traditional debugger.

Cloud Debugger supports only Java, Python, Go, Node.js, Ruby, and PHP.

Error Reporting automatically extracts errors from Cloud Logging logs with severity ERROR and stack traces.

Both services are part of Google Cloud's operations suite and have free quotas.

Default snapshot retention is 24 hours; error data retention is 30 days.

Use `gcloud debug snapshots create` to set snapshots from the CLI.

Error Reporting groups errors by stack trace similarity; you can set alerts via Cloud Monitoring.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Debugger

Purpose: Real-time debugging of production code without redeploying.

Mechanism: Captures snapshots of application state (call stack, variables).

Use case: Investigating a specific bug with detailed state inspection.

Data retention: Snapshots retained for 24 hours (default).

Supports: Java, Python, Go, Node.js, Ruby, PHP.

Error Reporting

Purpose: Aggregating and analyzing errors across an application.

Mechanism: Groups similar errors from logs or API reports.

Use case: Monitoring error trends, setting alerts, prioritizing fixes.

Data retention: Error data retained for 30 days.

Supports: Any language that logs to Cloud Logging or uses the API.

Watch Out for These

Mistake

Cloud Debugger pauses the application when capturing a snapshot.

Correct

Cloud Debugger captures snapshots asynchronously without pausing the application thread. The thread continues execution while the agent copies the state, ensuring minimal latency impact (typically <5ms).

Mistake

Cloud Debugger supports all programming languages.

Correct

Cloud Debugger only supports Java, Python, Go, Node.js, Ruby, and PHP. Languages like .NET, C++, and others are not supported.

Mistake

Error Reporting requires you to manually configure log parsing to extract errors.

Correct

Error Reporting automatically extracts errors from Cloud Logging entries with severity 'ERROR' or higher that contain stack traces. No manual configuration is needed for standard logging frameworks.

Mistake

Cloud Debugger snapshots are stored indefinitely.

Correct

Snapshots are retained for 24 hours by default. You can configure retention up to 30 days, but not indefinitely.

Mistake

Error Reporting can only be used with applications running on Google Cloud.

Correct

Error Reporting can ingest errors from any application that sends logs to Cloud Logging, including on-premises or other clouds, as long as the logs are forwarded to Cloud Logging.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I enable Cloud Debugger for a Compute Engine VM?

You need to install the Cloud Debugger agent on the VM. For Java, add the JVM argument `-agentpath:/opt/cdbg/cdbg_java_agent.so` to your application startup. For Python, install the `google-cloud-debugger` package and call `googleclouddebugger.enable()`. Ensure the VM has the necessary IAM permissions (roles/clouddebugger.agent). Verify by checking application logs for 'Cloud Debugger agent initialized'.

Can Cloud Debugger be used on .NET applications?

No. Cloud Debugger does not support .NET. The supported languages are Java, Python, Go, Node.js, Ruby, and PHP. For .NET applications, you must rely on traditional logging and Error Reporting.

Does Error Reporting require me to modify my code?

Not necessarily. If your application logs errors to Cloud Logging with severity 'ERROR' and includes a stack trace, Error Reporting will automatically parse and group them. However, for more control or to report errors without logging, you can use the Error Reporting API or client libraries.

How long are Cloud Debugger snapshots retained?

By default, snapshots are retained for 24 hours. You can configure retention up to 30 days via the Cloud Console or API. After the retention period, snapshots are automatically deleted.

Can I set alerts based on Error Reporting data?

Yes. You can create alerting policies in Cloud Monitoring using metrics derived from Error Reporting, such as the count of errors per group. For example, you can set an alert to notify you when a specific error group exceeds 100 occurrences in 5 minutes.

What is the difference between a snapshot and a log point in Cloud Debugger?

A snapshot captures the full state of the application (call stack, variables) at a specific line. A log point injects a log statement into the running code without redeploying; it only logs a message and does not capture state. Both are set via the Debugger interface.

Is Cloud Debugger available for Cloud Functions?

Yes, Cloud Debugger can be used with Cloud Functions for supported languages (Node.js, Python, Go, Java). However, due to the ephemeral nature of functions, snapshots may not always be captured if the function instance is short-lived. It is recommended for longer-running functions.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Cloud Debugger and Error Reporting — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Done with this chapter?