ACEChapter 11 of 101Objective 4.1

Cloud Monitoring and Logging

This chapter covers Cloud Monitoring and Cloud Logging, the core observability tools in Google Cloud. For the ACE exam, these topics appear in roughly 10-15% of questions, testing your ability to set up monitoring, create alerts, use logs, and troubleshoot using these services. You will need to understand the key components, configuration options, and common use cases to pass the exam.

25 min read
Intermediate
Updated May 31, 2026

The Facility Management System for Cloud

Imagine you are the facility manager of a large corporate campus with 50 buildings, each with hundreds of rooms, and thousands of employees. Your job is to ensure everything runs smoothly, but you can't be everywhere at once. So you install a comprehensive facility management system. This system has sensors in every room that track temperature, humidity, occupancy, and power usage. It also logs every door opening, every HVAC cycle, and every security badge swipe. You have a central dashboard that shows real-time status: green for normal, yellow for warning, red for critical. You set up alerts: if a server room exceeds 80°F, you get a text message within 5 minutes. You also have a log search tool to investigate historical incidents, like who entered the data center at 3 AM last Tuesday. The system retains logs for 30 days by default, but you can configure longer retention for compliance. You can also create custom metrics, like average power usage per floor over the last hour. This facility management system is exactly how Google Cloud's operations suite works: Cloud Monitoring is the real-time dashboard and alerting system, Cloud Logging is the log storage and search engine, and together they give you observability into your cloud environment. The sensors are the Cloud Monitoring agents and API integrations, the logs are the Cloud Logging entries, and the alerts are the notification channels you configure.

How It Actually Works

What is Cloud Monitoring and Cloud Logging?

Cloud Monitoring (formerly Stackdriver Monitoring) is a fully managed service that provides visibility into the performance, uptime, and health of your Google Cloud resources. It collects metrics, events, and metadata from Google Cloud services, your applications, and third-party sources, and presents them in dashboards, charts, and alerts. Cloud Logging (formerly Stackdriver Logging) is a fully managed service for storing, searching, analyzing, and alerting on log data from your cloud resources and applications. Together, they form the observability foundation for Google Cloud.

Why they exist

In any complex distributed system, you need to know what is happening at all times. Without monitoring, you are blind to performance issues, outages, and security incidents. Without logging, you cannot debug failures or audit activity. These services provide a unified solution that scales automatically, integrates deeply with Google Cloud services, and offers powerful analysis tools.

How Cloud Monitoring works

Cloud Monitoring collects metrics from various sources: - Google Cloud services: Compute Engine, Kubernetes Engine, Cloud SQL, etc. emit metrics automatically (e.g., CPU utilization, disk I/O, network traffic). - Agent-based monitoring: The Cloud Monitoring agent (based on collectd) runs on your VM instances and sends system metrics like memory usage, disk usage, and process information. - Custom metrics: You can write custom metrics using the Monitoring API or client libraries (e.g., application-specific counters). - External sources: You can monitor resources outside Google Cloud using the monitoring agent or API.

Metrics are stored in a time-series database. You can query them using the Monitoring Query Language (MQL) or the Metrics Explorer. You can create dashboards to visualize metrics and set up alerting policies that trigger when conditions are met (e.g., CPU > 80% for 5 minutes). Alerts can be sent via email, SMS, PagerDuty, Slack, webhooks, or other notification channels.

Key Cloud Monitoring concepts

Metric: A measurable value collected over time (e.g., compute.googleapis.com/instance/cpu/utilization). Each metric has a type, resource type, and labels.

Metric type: A unique identifier like compute.googleapis.com/instance/cpu/utilization. Google Cloud provides hundreds of predefined metric types.

Resource type: The monitored resource type (e.g., gce_instance, k8s_container, global).

Labels: Key-value pairs that refine the metric (e.g., instance_id, zone).

Time series: A set of data points for a metric, each with a timestamp and value.

Alerting policy: A set of conditions that, when met, trigger notifications. Conditions can be metric threshold, metric absence, or change rate.

Notification channel: Where alerts are sent (email, SMS, Slack, webhook, PagerDuty, Pub/Sub).

Uptime check: A test that checks if your service is reachable from various locations worldwide. You can configure HTTP, HTTPS, or TCP checks.

Dashboard: A customizable view of charts and widgets that display metrics.

Service Level Indicator (SLI): A metric that measures a specific aspect of service performance (e.g., latency, error rate).

Service Level Objective (SLO): A target value for an SLI (e.g., 99.9% availability over 30 days).

Service Level Agreement (SLA): A contract that specifies consequences if SLOs are not met.

How Cloud Logging works

Cloud Logging ingests log entries from various sources: - Google Cloud services: Most services automatically send logs (e.g., Cloud Audit Logs, VPC flow logs, firewall rules logs). - Agent-based logging: The Cloud Logging agent (based on fluentd) runs on your VMs and forwards logs to Cloud Logging. - Application logs: You can write logs using the Cloud Logging client libraries or directly to stdout/stderr on Google Kubernetes Engine (GKE) or Cloud Run. - Network logs: VPC flow logs, firewall rules logs, and packet mirroring logs are sent automatically. - Exported logs: You can export logs to Cloud Storage, BigQuery, or Pub/Sub for long-term retention or analysis.

Log entries are stored in log buckets. Each project has a default bucket named _Default with a retention of 30 days. You can create custom buckets with different retention periods (up to 3650 days for some storage classes). Logs are indexed and searchable using the Logs Explorer.

Key Cloud Logging concepts

Log entry: A single log record with a timestamp, severity, resource type, and payload.

Log name: The identifier for a log stream (e.g., projects/my-project/logs/compute.googleapis.com%2Factivity).

Log bucket: A storage container for log entries. Default retention is 30 days for _Default bucket.

Log sink: A configuration that routes log entries to a destination (Cloud Storage, BigQuery, Pub/Sub). You can filter logs using inclusion/exclusion filters.

Logs-based metric: A metric derived from log entries using a filter. For example, count the number of ERROR logs per minute.

Log view: A subset of log entries in a bucket, controlled by IAM permissions.

Log analytics: You can run SQL-like queries on logs stored in BigQuery.

Logs Router: The component that routes log entries to sinks and destinations based on rules.

Default retention periods

Cloud Logging _Default bucket: 30 days

Cloud Logging _Required bucket (contains Admin Activity and System Event audit logs): 400 days (cannot be modified)

Cloud Storage export: depends on storage class (e.g., Standard: 30 days minimum, Nearline: 30 days, Coldline: 90 days, Archive: 365 days)

BigQuery: no default retention; data persists until table is deleted or expiration is set

Pub/Sub: messages can be retained from 10 minutes to 7 days

Cloud Audit Logs

Cloud Audit Logs are automatically generated and contain three streams: - Admin Activity logs: Record operations that modify configuration or metadata (e.g., creating a VM, changing IAM permissions). Retained for 400 days in the _Required bucket. Cannot be disabled. - Data Access logs: Record operations that read or modify user-provided data (e.g., reading from Cloud Storage, querying BigQuery). Disabled by default; you must enable them. Retained for 30 days in the _Default bucket (can be configured). - System Event logs: Record non-human actions like automatic scaling or maintenance. Retained for 400 days in _Required bucket. - Policy Denied logs: Record when a user is denied access due to a policy violation. Disabled by default.

Configuration commands

Here are key gcloud commands for monitoring and logging:

Create a monitoring dashboard:

gcloud monitoring dashboards create --config-from-file=dashboard.yaml

Create an alerting policy:

gcloud alpha monitoring policies create --policy-from-file=policy.yaml

List metrics:

gcloud monitoring metrics list

Create a log sink:

gcloud logging sinks create my-sink bigquery.googleapis.com/projects/my-project/datasets/my_dataset --log-filter='severity>=ERROR'

View recent log entries:

gcloud logging read 'severity>=ERROR' --limit=10 --freshness=1h

Create a logs-based metric:

gcloud logging metrics create my-metric --description='Count of errors' --log-filter='severity=ERROR'

Interaction with other services

Cloud Monitoring and Cloud Logging: Logs-based metrics allow you to create monitoring metrics from log data. Alerts can be triggered by logs-based metrics.

Error Reporting: Uses Cloud Logging to aggregate and display application errors.

Cloud Debugger: Works with Cloud Logging to capture logpoints and snapshots.

Cloud Trace: Provides latency data that can be viewed in Monitoring dashboards.

Cloud Profiler: Shows CPU and memory usage profiles.

Cloud Security Command Center: Uses Cloud Audit Logs for security analysis.

VPC Service Controls: Can restrict access to Monitoring and Logging APIs.

Walk-Through

1

Create a Monitoring Dashboard

First, define what metrics you need to visualize. Use the Metrics Explorer to find the metric type (e.g., `compute.googleapis.com/instance/cpu/utilization`). Then create a dashboard using the Cloud Console or the `gcloud monitoring dashboards create` command with a YAML file. The YAML file specifies widgets, each with a title, chart type (line, bar, stacked area), and filter (e.g., `metric.type="compute.googleapis.com/instance/cpu/utilization"`). You can add multiple charts per dashboard. Dashboards are global resources and can be shared with other users via IAM.

2

Set Up an Alerting Policy

Navigate to Cloud Monitoring > Alerting > Create Policy. Define a condition: choose a metric (e.g., CPU utilization), an aggregation (e.g., mean for 5 min), a comparison (e.g., > 0.8 for 80%), and a duration (e.g., 5 minutes). You can also use MQL for complex conditions. Then configure notification channels: email, SMS, webhook, etc. Finally, set documentation and severity. The policy will evaluate every 60 seconds by default. Alerts have states: open, acknowledged, closed.

3

Create a Log Sink for Export

To retain logs longer than 30 days, create a log sink. In Cloud Logging > Logs Router, click Create Sink. Give it a name, select destination (Cloud Storage bucket, BigQuery dataset, or Pub/Sub topic). Optionally add an inclusion filter (e.g., `resource.type="gce_instance"`) and exclusion filters to reduce volume. The sink will route matching log entries as they arrive. For Cloud Storage, logs are stored as JSON or CSV files in hourly folders. For BigQuery, logs are loaded into a table named `_AllLogs` by default.

4

Enable Data Access Audit Logs

By default, Data Access audit logs are disabled. To enable them, go to IAM & Admin > Audit Logs in the Cloud Console. For each service (e.g., Cloud Storage, BigQuery), check the boxes for Admin Read, Data Read, and Data Write. You can also enable them for all services using the `gcloud` command: `gcloud logging buckets update _Default --enable-data-access`. Once enabled, logs will appear in Cloud Logging within minutes. Be aware of costs: Data Access logs can generate significant volume.

5

Create a Logs-Based Metric

Logs-based metrics allow you to turn log content into a monitoring metric. In Cloud Logging > Logs-based Metrics, click Create Metric. Define a filter (e.g., `severity=ERROR AND resource.type="k8s_container"`). Choose a metric type: counter (count occurrences) or distribution (track values like latency). The metric will appear in Cloud Monitoring as a custom metric. You can then create alerts on it. For example, alert if error count exceeds 100 in 5 minutes.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Platform Monitoring

A large e-commerce company runs its production workloads on Google Kubernetes Engine (GKE). They need to ensure 99.99% uptime during peak shopping seasons. They use Cloud Monitoring to track CPU, memory, and request latency at the pod level. They set up alerting policies: if p99 latency exceeds 500ms for 5 minutes, they get paged via PagerDuty. They also use uptime checks to verify the homepage loads from multiple global locations. For logging, they export all application logs to BigQuery for long-term analysis. They use logs-based metrics to count HTTP 500 errors and create a dashboard showing error rate over time. During Black Friday, they scale up their GKE cluster dynamically, and monitoring helps them identify a memory leak in a microservice within minutes.

Enterprise Scenario 2: Financial Services Compliance

A bank must comply with PCI DSS and SOX regulations, requiring audit logs to be retained for at least 7 years. They enable Cloud Audit Logs for all services and create log sinks that export Admin Activity and Data Access logs to Cloud Storage with archive class. They set up BigQuery for analyzing access patterns. They also create a log sink to Pub/Sub for real-time security event streaming to a SIEM. They use Cloud Monitoring to track the volume of denied access attempts and alert on unusual spikes. Misconfiguration: Initially, they forgot to enable Data Access logs, so they had no visibility into who read sensitive data. After enabling, they discovered a former employee had accessed customer records post-termination.

Common Pitfalls

Not setting up budget alerts for logging costs: Logging volume can be huge, especially Data Access logs. Always set budget alerts and use exclusion filters to drop noisy logs.

Using default retention for compliance: The default 30-day retention is insufficient for most compliance needs. Create custom buckets or export to longer-term storage.

Over-alerting: Setting alert thresholds too low causes alert fatigue. Use duration conditions (e.g., sustained for 5 minutes) to reduce noise.

Not using logs-based metrics: Many teams manually count logs instead of creating metrics, missing the ability to alert on log patterns.

How ACE Actually Tests This

ACE Exam Focus: Cloud Monitoring and Logging (Objective 4.1)

The ACE exam tests your ability to:

Configure monitoring for Google Cloud resources (Compute Engine, GKE, Cloud SQL, etc.)

Create and manage alerting policies and notification channels

Use Cloud Logging to view, filter, and export logs

Understand Cloud Audit Logs types and retention

Create logs-based metrics and use them in monitoring

Common Wrong Answers and Traps

1.

Confusing Cloud Monitoring with Cloud Logging: Some questions ask where to find a specific metric. Remember: metrics (like CPU) are in Monitoring, logs (like error messages) are in Logging.

2.

Data Access logs are enabled by default: FALSE. They are disabled by default. The exam loves to test this. Admin Activity logs are always enabled and cannot be disabled.

3.

Retention of _Default bucket is 400 days: FALSE. _Default is 30 days. _Required is 400 days. Many candidates mix them up.

4.

Logs can only be exported to Cloud Storage: FALSE. They can also go to BigQuery and Pub/Sub.

5.

Uptime checks only work for HTTP: FALSE. They support HTTP, HTTPS, and TCP.

Key Numbers to Memorize

Default log retention: 30 days (_Default), 400 days (_Required)

Data Access logs: disabled by default

Admin Activity logs: always enabled, cannot be disabled

Log sink destinations: Cloud Storage, BigQuery, Pub/Sub

Notification channels: email, SMS, PagerDuty, Slack, webhook, Pub/Sub

Uptime check locations: multiple global regions (e.g., us-west1, europe-west1)

Alerting evaluation period: every 60 seconds by default

How to Eliminate Wrong Answers

If a question asks about 'real-time monitoring of CPU', the answer is Cloud Monitoring, not Cloud Logging.

If a question asks about 'audit trail for who deleted a VM', the answer is Cloud Audit Logs (Admin Activity).

If a question asks about 'storing logs for 7 years', the answer is export to Cloud Storage or BigQuery, not the default bucket.

If a question asks about 'alerting when error logs appear', the answer is logs-based metric with alerting policy.

Edge Cases

VPC flow logs are not enabled by default; they must be enabled per subnet.

Firewall rules logs can be enabled per rule.

Cloud NAT logs can be enabled for NAT gateways.

You can create uptime checks for internal resources only if you use a private uptime check (requires a proxy in your VPC).

Logs-based metrics can be used in alerting policies, but there is a delay of up to 5 minutes.

Key Takeaways

Cloud Monitoring collects metrics and provides alerting; Cloud Logging stores logs and enables search/export.

Admin Activity audit logs are always enabled and retained for 400 days; Data Access logs are disabled by default and retained for 30 days.

Default log retention in the _Default bucket is 30 days; you can create custom buckets with longer retention or export to Cloud Storage/BigQuery/Pub/Sub.

Logs-based metrics allow you to create monitoring metrics from log filters, enabling alerts on log patterns.

Uptime checks can monitor external and internal endpoints (HTTP, HTTPS, TCP) from multiple global locations.

Notification channels include email, SMS, PagerDuty, Slack, webhook, and Pub/Sub.

To reduce logging costs, use exclusion filters to drop noisy logs and set budget alerts.

The ACE exam often tests the difference between Admin Activity and Data Access logs, and default retention values.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Monitoring

Collects and visualizes metrics (time-series data).

Provides dashboards, charts, and alerting.

Metrics are numerical (e.g., CPU, latency).

Default retention is not applicable (metrics can be queried historically based on retention policy).

Uses Metrics Explorer, MQL, and alerting policies.

Cloud Logging

Collects, stores, and searches log entries (text/structured data).

Provides Logs Explorer, log sinks, and logs-based metrics.

Logs are textual or structured records (e.g., error messages, JSON).

Default retention is 30 days (_Default) and 400 days (_Required).

Uses log filters, inclusion/exclusion rules, and export destinations.

Watch Out for These

Mistake

Cloud Monitoring and Cloud Logging are the same service.

Correct

They are separate services: Monitoring handles metrics and alerts; Logging handles log storage and search. They integrate but are distinct.

Mistake

Data Access audit logs are enabled by default for all services.

Correct

Data Access logs are disabled by default. You must enable them per service or globally. Admin Activity logs are always enabled.

Mistake

All logs are retained for 400 days.

Correct

Only logs in the _Required bucket (Admin Activity and System Event) are retained for 400 days. The _Default bucket retains logs for 30 days.

Mistake

You cannot create alerts based on log content.

Correct

You can create logs-based metrics from log filters, then create alerting policies on those metrics. This is a common exam scenario.

Mistake

Logs can only be exported to Cloud Storage.

Correct

Logs can be exported to Cloud Storage, BigQuery, and Pub/Sub via log sinks. Each has different use cases.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I enable Data Access audit logs?

Go to IAM & Admin > Audit Logs in the Cloud Console. For each service (e.g., Cloud Storage, BigQuery), check the boxes for Admin Read, Data Read, and Data Write. Alternatively, use the gcloud command: `gcloud logging buckets update _Default --enable-data-access`. Remember that Data Access logs are disabled by default and can generate significant volume and cost.

What is the difference between Cloud Monitoring and Cloud Logging?

Cloud Monitoring focuses on metrics (numerical data like CPU usage) and alerting. Cloud Logging focuses on log entries (text records like error messages). They integrate: logs-based metrics allow you to turn log data into monitoring metrics. Both are part of the Google Cloud operations suite.

How long are logs retained by default?

The _Default log bucket retains logs for 30 days. The _Required bucket (Admin Activity and System Event audit logs) retains logs for 400 days. You can create custom buckets with different retention periods (up to 3650 days for some storage classes) or export logs to Cloud Storage, BigQuery, or Pub/Sub for longer retention.

Can I create alerts based on log content?

Yes. Create a logs-based metric in Cloud Logging with a filter that matches the log content (e.g., `severity=ERROR`). This creates a custom metric in Cloud Monitoring. Then create an alerting policy on that metric (e.g., alert if count > 10 in 5 minutes).

What are the supported destinations for log sinks?

Log sinks can export logs to Cloud Storage (for long-term archival), BigQuery (for analysis), and Pub/Sub (for real-time streaming to third-party tools). You can also have multiple sinks with different filters.

How do I monitor a Compute Engine VM's disk usage?

Install the Cloud Monitoring agent on the VM. The agent sends disk metrics (e.g., `agent.googleapis.com/disk/percent_used`) to Cloud Monitoring. You can then view these metrics in the Metrics Explorer or create a dashboard. For basic metrics like CPU, the agent is not required as they are collected automatically.

What are the common exam traps for Cloud Audit Logs?

Common traps: (1) Thinking Data Access logs are enabled by default – they are not. (2) Confusing retention: _Default is 30 days, _Required is 400 days. (3) Believing you can disable Admin Activity logs – you cannot. (4) Assuming all audit logs go to the same bucket – Admin Activity goes to _Required, Data Access goes to _Default (by default).

Terms Worth Knowing

Ready to put this to the test?

You've just covered Cloud Monitoring and Logging — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Done with this chapter?