An application running on Google Kubernetes Engine (GKE) emits structured logs in JSON format. The DevOps team wants to count the number of log entries that contain a specific error code (e.g., 'error_code': 500) in the last hour and use that count to trigger an alert if it exceeds a threshold. What is the most efficient way to achieve this?
Log-based metrics automatically count matching log entries and export them as a metric to Cloud Monitoring, enabling alerting and dashboards with minimal overhead.
Why this answer
Creating a log-based metric from the logs is the most efficient approach. You can define a counter metric that increments each time a log entry matches the filter (e.g., jsonPayload.error_code=500). Then you can set up an alerting policy on that metric.
This avoids scanning logs in real-time and provides a metric that can be used for dashboards and alerts.