A company uses Cloud Monitoring with custom metrics. They have a custom metric called 'requests_total' with labels 'endpoint', 'status_code'. They want to create an alert that fires if the error rate (status_code >=500) for any endpoint exceeds 5% over a 5-minute window. Which MQL query should they use?
Correct: groups errors and total by endpoint, divides, and applies condition.
Why this answer
Option A is correct because it first filters for error responses (status_code >= 500), then groups by endpoint and sums the error count, and divides that by the total count per endpoint (also grouped and summed). This computes the error rate per endpoint, and the condition fires when that rate exceeds 0.05 (5%) over the 5-minute window. The use of two separate group_by operations within a join (the `{ ... } / { ... }` syntax) is the correct MQL pattern for calculating a ratio per label.
Exam trap
Cisco often tests the distinction between `ratio` (which operates on the number of time series) and explicit division with group_by (which operates on metric values per label), leading candidates to incorrectly choose a `ratio`-based query that ignores per-endpoint grouping.
How to eliminate wrong answers
Option B is wrong because it filters for status_code < 500 (successes) instead of errors, and uses `ratio` without the proper group_by to compute per-endpoint rates, which would produce an overall ratio across all endpoints. Option C is wrong because it applies `group_by [endpoint], sum()` before filtering for errors, which sums all requests first and then filters, making it impossible to compute a per-endpoint error rate correctly. Option D is wrong because it uses `ratio` without any group_by, which would compute the overall error rate across all endpoints combined, not per endpoint as required.