SOA-C02Chapter 89 of 104Objective 1.2

AWS Systems Manager Incident Manager

AWS Systems Manager Incident Manager lets you manage and respond to incidents on AWS. It is part of the Monitoring domain (Objective 1.2) on the SOA-C02 exam. Incident Manager typically appears in 2-3 exam questions, focusing on its integration with CloudWatch alarms, the incident lifecycle, and response plans. You will learn the core components, step-by-step incident flow, and common pitfalls to avoid.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Incident Manager as a Fire Station Dispatch

A city fire station, its dispatch center at the heart of operations. The dispatch center monitors alarms from various buildings (your AWS resources). When a fire alarm (incident) is triggered, the dispatcher (Incident Manager) immediately creates an incident record, assigns a severity (e.g., 1 for a full blaze, 2 for a small fire), and notifies the appropriate fire crew (on-call responders) via pagers (SNS, SMS, or chat). The dispatch center uses a pre-defined runbook (a checklist of actions: which hoses to use, which exits to block) to guide the crew. As the crew fights the fire, they update the dispatch board (incident timeline) with progress. If the fire escalates, the dispatcher can automatically involve more crews (escalation). After the fire is out, the dispatch center archives the incident for post-mortem analysis. Without this dispatch system, each building would have to manually call fire stations, causing delays and confusion. Incident Manager automates the entire lifecycle: detection, notification, response, and resolution, ensuring the right people are alerted with the right context at the right time.

How It Actually Works

What is AWS Systems Manager Incident Manager?

AWS Systems Manager Incident Manager is a fully managed service that helps you prepare for, detect, respond to, and resolve incidents. It centralizes incident management by automating response actions, notifying the right people, and providing a structured process to minimize downtime. Incident Manager is part of AWS Systems Manager, a suite of tools for operational management.

Why Incident Manager Exists

Before Incident Manager, teams had to manually handle incidents using custom scripts, third-party tools, or spreadsheets. This led to inconsistent responses, delayed notifications, and difficulty in tracking resolution progress. Incident Manager provides a standardized, automated approach that integrates with AWS services like CloudWatch, EventBridge, and Systems Manager Automation runbooks. It ensures that incidents are handled consistently, with a clear audit trail.

How Incident Manager Works Internally

Incident Manager operates through three main concepts: response plans, incidents, and engagement plans.

Response Plan: Defines the actions to take when an incident is created. It specifies the contacts, escalation channels, and automation runbooks to run. Each response plan is associated with a specific CloudWatch alarm or EventBridge rule.

Incident: A record of an event that requires attention. It has a severity level (1-5), a status (Open, In Progress, Resolved), and a timeline of events. Incidents are created automatically by CloudWatch alarms or manually via the console/API.

Engagement Plan: Determines how and when to notify responders. It includes contact channels (SMS, email, voice, chat) and escalation rules. Engagement plans can be re-used across multiple response plans.

When a CloudWatch alarm triggers, it can invoke a response plan via an EventBridge rule. The response plan then: 1. Creates an incident with a specified severity. 2. Starts an automation runbook (e.g., to take a snapshot, restart an instance). 3. Engages the on-call team using the engagement plan. 4. Updates the incident timeline as responders provide updates.

Key Components, Values, Defaults, and Timers

Severity Levels: 1 (critical) to 5 (informational). Default is 3.

Incident Status: Open, In Progress, Resolved. You can also set a resolution plan.

Contact Channels: SMS (10-160 characters), email, voice (up to 5 minutes), chat (via AWS Chatbot).

Engagement Plan: Supports multiple contacts and escalation rules. Escalation can be time-based (e.g., if not acknowledged in 5 minutes, escalate to next tier).

Automation Runbooks: Pre-defined Systems Manager Automation documents (e.g., AWSIncidentManager-ResolveIncident) or custom runbooks.

Timeline: Automatically records events like incident creation, responder acknowledgments, runbook execution, and resolution. You can add manual entries.

Integration with CloudWatch: Incidents can be created from any CloudWatch alarm state change (OK, ALARM, INSUFFICIENT_DATA). The alarm must be associated with a response plan via EventBridge.

Integration with EventBridge: EventBridge rules match alarm state changes and trigger the response plan.

Configuration and Verification Commands

To create a response plan using the AWS CLI:

aws ssm-incidents create-response-plan \
    --name "MyResponsePlan" \
    --incident-template "{\"title\": \"Example incident\", \"severity\": \"3\"}" \
    --integrations "[{\"pagerDutyConfiguration\": {\"name\": \"PagerDuty\", \"pagerDutyIncidentConfiguration\": {\"serviceId\": \"P12345\"}, \"secretId\": \"arn:aws:secretsmanager:us-east-1:123456789012:secret:MyPagerDutyKey\"}}]"

To list incidents:

aws ssm-incidents list-incidents

To update an incident status:

aws ssm-incidents update-incident-record \
    --arn "arn:aws:ssm-incidents::123456789012:incident/MyIncident" \
    --status "Resolved"

How It Interacts with Related Technologies

AWS Systems Manager Automation: Runbooks can execute automated actions like restarting EC2 instances, taking EBS snapshots, or patching.

AWS Chatbot: Sends notifications to Slack or Amazon Chime channels.

Amazon CloudWatch: Alarms trigger incidents via EventBridge.

AWS EventBridge: Routes alarm state changes to response plans.

AWS Lambda: Custom actions can be triggered via runbooks or EventBridge targets.

AWS Secrets Manager: Stores credentials for third-party integrations like PagerDuty.

AWS CloudTrail: Logs all Incident Manager API calls for auditing.

Exam Tips

Remember that Incident Manager is part of AWS Systems Manager, not a standalone service.

Know the difference between a response plan (defines actions) and an engagement plan (defines notification rules).

Understand that incidents can be created manually or automatically from CloudWatch alarms.

Be aware that you can integrate with third-party tools like PagerDuty and Slack.

The default severity is 3. Severity 1 is highest.

Escalation rules are time-based; if a responder does not acknowledge within a set time, the incident escalates.

Walk-Through

CloudWatch Alarm Triggers

A CloudWatch alarm enters the ALARM state based on a metric threshold (e.g., CPU > 80% for 5 minutes). The alarm is configured to send to an SNS topic or directly to EventBridge. EventBridge has a rule that matches the alarm state change and invokes a specific Incident Manager response plan. The alarm ARN is passed as context to the response plan.

Response Plan Execution

The response plan receives the alarm context. It creates an incident record with a predefined severity (e.g., 2) and title. It also triggers any associated Systems Manager Automation runbook. The runbook can perform automated actions like stopping an EC2 instance or taking a snapshot. The incident is created in 'Open' status.

Engage On-Call Team

The engagement plan associated with the response plan sends notifications to the on-call contacts via configured channels (SMS, email, voice, chat). The engagement plan may have escalation rules: if the primary responder does not acknowledge within 5 minutes, the incident escalates to the secondary responder. Acknowledgment is done via the Incident Manager console or by replying to a notification.

Responder Investigates and Updates

The responder reviews the incident details, including the alarm context and runbook output. They update the incident status to 'In Progress' and add timeline entries (e.g., 'Investigating root cause'). They can also run additional runbooks from the incident console. The timeline provides a chronological log of all actions.

Incident Resolution and Closure

Once the issue is resolved, the responder sets the incident status to 'Resolved'. They may add a resolution note. Incident Manager archives the incident for later analysis. The timeline is preserved for post-incident reviews. CloudWatch alarms that triggered the incident may return to OK state, but the incident remains resolved.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Platform with Critical Latency Spikes

A large e-commerce company uses Incident Manager to respond to latency spikes in their payment processing service. They have a CloudWatch alarm on the API Gateway latency metric. When latency exceeds 2 seconds for 3 consecutive periods, the alarm triggers Incident Manager. The response plan runs a Systems Manager Automation runbook that captures a thread dump from the affected EC2 instances and stores it in S3. It also creates an incident with severity 1. The engagement plan sends SMS and Slack messages to the on-call team. The team uses the thread dump to identify a database bottleneck. They update the incident with findings and resolve it after scaling the database. The incident timeline provides a full audit trail for compliance.

Enterprise Scenario 2: Financial Services with Compliance Requirements

A bank uses Incident Manager to handle security incidents. They have a CloudTrail alarm for unauthorized API calls. When triggered, Incident Manager creates a severity 2 incident and runs a runbook that isolates the compromised IAM user by attaching a DenyAll policy. The engagement plan calls the security team via voice and sends an email to the manager. The team investigates and updates the incident. After resolution, they export the incident timeline for compliance reporting. They have configured multiple engagement plans for different severity levels: severity 1 escalates to the CISO after 2 minutes, severity 2 escalates to the team lead after 5 minutes.

Common Misconfiguration and Issues

Missing IAM Permissions: The Incident Manager service role must have permissions to run runbooks and access resources. If the role is missing, the runbook fails silently.

Incorrect EventBridge Rule: The rule must match the exact alarm state change. A common mistake is using the wrong event pattern.

Engagement Plan Timeouts: If responders do not acknowledge, the escalation may be too slow. Set realistic acknowledgment windows.

Third-Party Integration Failures: If PagerDuty or Slack tokens expire, notifications fail. Use Secrets Manager to rotate secrets.

How SOA-C02 Actually Tests This

SOA-C02 Objective Coverage

Incident Manager falls under Domain 1: Monitoring and Reporting, Objective 1.2: Manage incidents using AWS Systems Manager Incident Manager. The exam tests your ability to:

Configure response plans and engagement plans.

Integrate with CloudWatch alarms and EventBridge.

Understand the incident lifecycle (Open, In Progress, Resolved).

Identify the correct order of steps when an alarm triggers.

Differentiate between Incident Manager and other Systems Manager capabilities.

Common Wrong Answers and Why Candidates Choose Them

1. 'Incident Manager can directly restart EC2 instances without a runbook.' - Wrong because Incident Manager itself does not execute actions; it uses Systems Manager Automation runbooks to perform actions. Candidates often think Incident Manager has built-in remediation.

2. 'Incident Manager requires a third-party tool like PagerDuty to send notifications.' - Wrong because Incident Manager has built-in notification channels (SMS, email, voice, chat). PagerDuty is an optional integration. Candidates assume third-party is mandatory.

3. 'Incident Manager incidents are automatically resolved when the CloudWatch alarm returns to OK.' - Wrong because incidents must be manually resolved or via a runbook. The alarm state change does not automatically resolve the incident. Candidates confuse alarm lifecycle with incident lifecycle.

4. 'You can create an incident only from a CloudWatch alarm.' - Wrong because you can manually create incidents from the console or API. Candidates overlook the manual creation option.

Specific Numbers and Values to Memorize

Severity levels: 1 (critical) to 5 (informational). Default: 3.

Engagement plan escalation timers: can be set in minutes (e.g., 5 minutes).

Supported contact channels: SMS, email, voice, chat (via AWS Chatbot).

Incident statuses: Open, In Progress, Resolved.

Integration with Systems Manager Automation runbooks.

Edge Cases and Exceptions

If a response plan is deleted, existing incidents are not affected.

Engagement plans can be shared across multiple response plans.

Incident Manager supports cross-region replication of incidents? No, incidents are regional.

You can assign a custom incident template with placeholders for alarm details.

How to Eliminate Wrong Answers

If an answer mentions direct remediation without runbooks, it is wrong.

If an answer says incidents auto-resolve when alarm OKs, it is wrong.

If an answer says you must use a third-party for notifications, it is wrong.

Look for keywords: 'response plan', 'engagement plan', 'runbook', 'severity'.

Key Takeaways

Incident Manager is part of AWS Systems Manager, used for automated incident response.

Response plans define actions; engagement plans define notifications and escalations.

Incidents can be created automatically from CloudWatch alarms or manually.

Severity levels range from 1 (critical) to 5 (informational); default is 3.

Incident statuses: Open, In Progress, Resolved.

Automation runbooks execute remediation steps (e.g., restart instances).

Notifications can be sent via SMS, email, voice, or chat (Slack/Chime).

Escalation rules are time-based; if not acknowledged, incident escalates to next responder.

Incidents must be manually resolved; they do not auto-resolve when alarm clears.

Incident Manager integrates with CloudWatch, EventBridge, Systems Manager Automation, and third-party tools like PagerDuty.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Incident Manager

Automated incident creation from CloudWatch alarms

Built-in notification channels (SMS, email, voice, chat)

Automated runbook execution for remediation

Centralized incident timeline and audit trail

Escalation rules with time-based triggers

Manual Incident Handling

Requires manual creation of incident tickets

Notifications sent via separate tools or scripts

Remediation done manually by responders

No standardized timeline; relies on emails or notes

Escalation depends on human intervention

Watch Out for These

Mistake

Incident Manager is a standalone service separate from Systems Manager.

Correct

Incident Manager is a capability of AWS Systems Manager, not a standalone service. It is accessed via the Systems Manager console.

Mistake

Incidents can only be created automatically from CloudWatch alarms.

Correct

Incidents can be created manually via the console or API, not just from alarms. Manual creation is useful for testing or external events.

Mistake

Incident Manager can automatically resolve incidents when the underlying alarm returns to OK.

Correct

Incidents must be resolved manually or via a runbook. There is no automatic resolution based on alarm state.

Mistake

You must use PagerDuty or another third-party tool to send notifications.

Correct

Incident Manager has built-in notification channels: SMS, email, voice, and chat (via AWS Chatbot). Third-party tools are optional.

Mistake

Engagement plans are the same as response plans.

Correct

Response plans define the actions (runbooks, incident template) and reference an engagement plan. Engagement plans define notification and escalation rules.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I create an incident from a CloudWatch alarm using Incident Manager?

First, create a response plan in Incident Manager that specifies the severity and any automation runbooks. Then, create an EventBridge rule that matches the CloudWatch alarm state change (e.g., ALARM) and targets the response plan. When the alarm triggers, EventBridge invokes the response plan, which creates an incident and sends notifications.

Can Incident Manager automatically resolve incidents?

No, incidents must be resolved manually by a responder or via a Systems Manager Automation runbook. There is no automatic resolution when the triggering alarm returns to OK. You can create a runbook that sets the incident status to Resolved, but it must be triggered manually or by another event.

What notification channels does Incident Manager support?

Incident Manager supports SMS (text messages), email, voice calls, and chat notifications via AWS Chatbot (which integrates with Slack and Amazon Chime). You can also integrate with third-party tools like PagerDuty and Atlassian Opsgenie.

What is the difference between a response plan and an engagement plan?

A response plan defines the incident template (title, severity) and the automation runbooks to run when an incident is created. It also references an engagement plan. An engagement plan defines the contacts to notify, their notification channels, and escalation rules (e.g., escalate after 5 minutes if not acknowledged).

How do I set up escalation in Incident Manager?

In the engagement plan, you can define multiple contacts in a hierarchy. For each contact, you can set a duration (in minutes) after which the incident escalates to the next contact if the current one does not acknowledge. Acknowledgment can be done via the Incident Manager console or by responding to a notification.

Can I use Incident Manager without CloudWatch alarms?

Yes, you can manually create incidents from the Incident Manager console or API. This is useful for incidents reported by users or from external monitoring tools. Manual incidents follow the same lifecycle as automatically created ones.

Does Incident Manager support cross-region incidents?

Incident Manager is a regional service. Incidents are created and managed within a single AWS region. If you need cross-region incident management, you must set up separate response plans in each region or use a global dashboard via third-party tools.

Terms Worth Knowing

Incident response SIEM SOAR

Ready to put this to the test?

You've just covered AWS Systems Manager Incident Manager — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.

Try SOA-C02 practice questions Back to all chapters

Done with this chapter?

AWS Service Health Dashboard and Personal Health

SSM Change Manager and Change Calendar

See the full SOA-C02 study guide