An SRE team analyzes that their service had 47 minutes of downtime in the past 30 days. Their SLO is 99.9% monthly availability. How should the team characterize their performance relative to the SLO?
The math: 30 days × 24 hours × 60 minutes = 43,200 minutes. 0.1% × 43,200 = 43.2 minutes allowed downtime. 47 minutes actual > 43.2 minutes allowed → SLO missed by ~3.8 minutes. The error budget is exhausted and the team should prioritize reliability work.
Why this answer
The SLO of 99.9% monthly availability allows a maximum downtime of 43.2 minutes in a 30-day month (30 days × 24 hours × 60 minutes × 0.001 = 43.2 minutes). Since the actual downtime was 47 minutes, the error budget was exceeded by 3.8 minutes, meaning the SLO was missed. This calculation is standard for Google Cloud SRE practices, where error budgets are derived directly from the SLO percentage.
Exam trap
Google Cloud often tests the precise calculation of error budgets from SLO percentages, trapping candidates who round or assume common approximations (like 1 hour per month) instead of computing the exact allowed downtime.
How to eliminate wrong answers
Option A is wrong because it incorrectly assumes a fixed 1-hour threshold; the correct error budget for 99.9% availability over 30 days is 43.2 minutes, not 60 minutes. Option C is wrong because downtime minutes are the correct unit for measuring availability when the SLO is expressed as a percentage of uptime over a defined period. Option D is wrong because 47 minutes represents approximately 0.11% downtime (47 / 43,200), not less than 0.5%, and the SLO was missed, not met with margin.