Your team receives an alert that the Error Reporting count for a critical service has increased tenfold in the last 10 minutes. You suspect a recent code deployment is the cause. What is the first action you should take?
Rolling back quickly mitigates user impact.
Why this answer
A tenfold increase in error reporting within 10 minutes strongly indicates a recent code deployment introduced a critical bug. The immediate priority is to restore service stability by rolling back the deployment to the previous known-good version, as this directly mitigates the root cause. Delaying the rollback risks further degradation of the service and potential data loss or corruption.
Exam trap
Google Cloud often tests the candidate's ability to prioritize immediate incident response over long-term analysis, trapping those who confuse 'post-mortem' (a retrospective activity) with 'first action' (a corrective action).
How to eliminate wrong answers
Option A is wrong because disabling the alert ignores the underlying problem and violates the principle of observability; alerts are meant to signal real issues, and suppressing them does not reduce errors. Option C is wrong because increasing the instance count does not fix the root cause—if the new code has a bug (e.g., a memory leak or an unhandled exception), adding more instances will only replicate the failure across more nodes, potentially amplifying the impact. Option D is wrong because opening a post-mortem is a reactive step that should occur after the immediate incident is resolved; performing it first wastes critical time and delays the necessary rollback.