This chapter covers Amazon EventBridge for automated remediation, a core topic for the SOA-C02 exam's Monitoring domain (Objective 1.2). EventBridge is a serverless event bus service that enables you to build event-driven architectures by connecting application data from AWS services, custom applications, and SaaS partners. On the exam, approximately 5-8% of questions touch on EventBridge, often focusing on its role in automating responses to AWS Health events, CloudTrail API calls, and resource state changes. You will need to understand how to create rules, configure targets, and use event patterns to trigger remediation actions such as invoking Lambda functions or sending notifications.
Jump to a section
Imagine a large office building with multiple floors, each containing different departments. The building has a smart fire alarm system that continuously monitors various sensors: smoke detectors, heat sensors, manual pull stations, and sprinkler flow switches. Each sensor is an event source. When a smoke detector on the 3rd floor activates, it sends a signal to the central fire alarm panel. The panel acts as an event bus—it receives the signal and then, based on pre-configured rules, decides what actions to take. For example, if the smoke detector on the 3rd floor activates, the panel might (1) sound alarms on that floor only, (2) automatically unlock the stairwell doors, (3) notify the fire department via a dedicated phone line, and (4) send a page to the building manager. The rules are independent: the same event can trigger multiple actions, and different events can trigger the same action. The panel does not store events long-term; it processes them in near-real-time and then forgets them unless an action logs them. If a sprinkler flow switch activates on the 2nd floor, the panel might sound alarms on that floor and shut down the HVAC system to prevent smoke spread. The key is that the panel decouples the sensors from the actions—sensors don't know what actions will be taken, and actions don't know which sensor triggered them. This is exactly how Amazon EventBridge works: it ingests events from various sources (AWS services, custom applications, SaaS partners), applies rules to filter and route events, and then delivers them to targets like Lambda functions, Step Functions, SQS queues, or HTTP endpoints. Just as the fire panel can handle multiple sensors and multiple actions, EventBridge can handle millions of events per second with low latency.
What is Amazon EventBridge and Why It Exists
Amazon EventBridge is a serverless event bus that ingests, filters, transforms, and routes events from various sources to targets. It evolved from Amazon CloudWatch Events, offering enhanced capabilities like schema registry, event archiving, and replay. EventBridge enables decoupled, event-driven architectures where producers and consumers are independent. For automated remediation, EventBridge can monitor events from AWS services (e.g., EC2 instance state changes, CloudTrail API calls, AWS Health events) and trigger automated responses such as running a Lambda function to stop an unhealthy instance or sending a notification to an SNS topic.
How EventBridge Works Internally
EventBridge operates on a publish-subscribe model. Events are JSON objects that describe a change in state. The flow is:
Event Producers emit events to an event bus. These can be AWS services (e.g., EC2, Config, Health), custom applications, or SaaS partners.
Rules are associated with an event bus. Each rule contains an event pattern (a filter that matches specific events) and one or more targets (destinations for matching events).
When an event arrives, EventBridge evaluates it against all rules on the bus. If the event matches a rule's pattern, the rule routes the event to its targets.
Targets can be Lambda functions, Step Functions state machines, SQS queues, SNS topics, Kinesis streams, API Gateway endpoints, or other event buses. EventBridge can also transform the event before delivery using input transformers.
If a target is unavailable (e.g., Lambda throttles), EventBridge retries delivery for up to 24 hours with exponential backoff (configurable per target).
Key Components, Values, Defaults, and Timers
- Event Buses: Default bus for AWS services, custom bus for your applications, and partner buses for SaaS events. You can create up to 100 custom event buses per account (soft limit).
- Rules: Up to 300 rules per event bus. Each rule can have up to 5 targets.
- Event Patterns: Must match the event's source, detail-type, resources, and detail fields. Patterns use exact matching, prefix matching, suffix matching, numeric matching, and existence matching. Example pattern:
{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": {
"state": ["stopped"]
}
}Targets: Must be in the same region as the event bus. Lambda functions are the most common target for remediation.
Retry Policy: Default is 24-hour retry with exponential backoff (starting at 1 second, doubling up to 5 minutes). You can set a maximum retry time (1-24 hours) and maximum age of event (1 minute to 24 hours).
Event Size: Maximum 256 KB per event. Larger events are rejected.
Schema Registry: Automatically discovers and stores event schemas, enabling code generation for Lambda.
Archive and Replay: Can archive events for up to 14 days (default) and replay them to a new or existing event bus.
Configuration and Verification
To create a rule for automated remediation via AWS CLI:
aws events put-rule \
--name "UnhealthyInstanceRemediation" \
--event-pattern '{"source":["aws.health"],"detail-type":["AWS Health Event"],"detail":{"service":["EC2"],"eventTypeCategory":["issue"]}}' \
--state ENABLEDThen add a target (Lambda function):
aws events put-targets \
--rule "UnhealthyInstanceRemediation" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:123456789012:function:myRemediationFunction"To verify the rule:
aws events list-rules --name-prefix "Unhealthy"
aws events list-targets-by-rule --rule "UnhealthyInstanceRemediation"Interaction with Related Technologies
AWS Lambda: Most common target for remediation. The Lambda function receives the event and can perform actions like stopping/terminating instances, modifying security groups, or calling AWS APIs.
AWS Step Functions: For complex workflows with multiple steps, error handling, and human approval.
AWS Systems Manager: Use EventBridge to trigger SSM Automation documents for runbooks.
AWS Config: EventBridge can react to Config rule compliance changes (e.g., non-compliant resource) to trigger remediation.
AWS Health: EventBridge receives AWS Health events (e.g., EC2 instance scheduled retirement) to automatically launch replacement instances.
CloudTrail: EventBridge can capture API calls (e.g., CreateSecurityGroup) and trigger security responses.
SNS/SQS: For notifications or decoupled processing.
Automated Remediation Patterns
EC2 Instance Health Remediation: Monitor AWS Health events for EC2 instance issues (e.g., impaired volume). Lambda function can snapshot volumes, launch replacement instance, and update Route 53.
Security Group Change Remediation: Detect unauthorized security group rule additions via CloudTrail and invoke Lambda to revert the change.
Auto Scaling Group Launch Failure: On ASG launch failure events, trigger Lambda to analyze error and retry with different configuration.
S3 Bucket Public Access: On S3 Bucket Policy change events, invoke Lambda to revert public access.
Performance and Limits
EventBridge can handle millions of events per second per account.
Each rule can process up to 10,000 events per second (burst).
Event delivery latency is typically under 1 second.
Targets must be in the same region; cross-region delivery requires using a custom event bus in the target region.
Events are not stored permanently unless archived. Default archive retention is 14 days, max 14 days.
Best Practices
Use input transformers to reshape the event before passing to the target, reducing Lambda complexity.
Enable dead-letter queues (DLQ) for failed deliveries, especially for critical remediation.
Use event archival for auditing and replay during testing.
Set permissions correctly: EventBridge needs permission to invoke the target (e.g., Lambda resource policy).
Use idempotent Lambda functions to avoid side effects from retries.
Exam-Relevant Details
EventBridge is the preferred service for event-driven automation over CloudWatch Events (which is being phased out).
Rules are evaluated in near real-time; there is no batching delay.
Event patterns are case-sensitive.
To match multiple values in a field, use an array: "state": ["running", "stopped"].
The detail-type field is always a string; you can use prefix matching with "detail-type": [{"prefix": "EC2"}].
EventBridge cannot directly invoke an EC2 instance; it must use a Lambda function or SSM Automation as an intermediary.
1. Event Emission
An event is emitted by an event source. For AWS services, events are automatically sent to the default event bus. For example, when an EC2 instance state changes to 'stopped', the EC2 service publishes an event with source 'aws.ec2', detail-type 'EC2 Instance State-change Notification', and detail containing the instance ID and new state. The event is a JSON object up to 256 KB. The event is sent to the default event bus in the same region and account. The event producer does not need to know about any rules; it simply publishes to the bus.
2. Event Ingestion by Bus
The event bus receives the event and immediately begins evaluating it against all active rules associated with that bus. EventBridge uses a publish-subscribe model: the bus does not store the event long-term unless archiving is enabled. The event is processed in-memory for rule matching. If no rules match, the event is discarded (unless archived). The bus can handle high throughput; each event is processed independently.
3. Rule Matching
EventBridge compares the incoming event against the event pattern of each rule. The pattern is a JSON object that specifies conditions on fields like source, detail-type, resources, and detail. Matching can be exact, prefix, suffix, numeric, or existence. For example, a rule with pattern `{"source":["aws.ec2"],"detail-type":["EC2 Instance State-change Notification"],"detail":{"state":["stopped"]}}` will match only EC2 state-change events where the new state is 'stopped'. If multiple rules match, each rule independently routes the event to its targets. There is no priority or ordering; all matching rules execute concurrently.
4. Event Transformation (Optional)
Before delivering the event to a target, EventBridge can apply an input transformer. The input transformer uses a template to reshape the event JSON. For example, you can extract only the instance ID and state, or add static values. This reduces the need for Lambda to parse the entire event. The template uses JSONPath expressions. If no transformer is configured, the entire event is sent to the target.
5. Event Delivery to Target
EventBridge delivers the (possibly transformed) event to the target. For a Lambda target, EventBridge invokes the function asynchronously. For SQS, it sends a message to the queue. For HTTP targets, it makes a POST request. Delivery is synchronous from EventBridge's perspective: it waits for a response from the target (e.g., Lambda returns 200). If the target fails (e.g., Lambda throttles), EventBridge retries according to the retry policy. The default retry policy retries for up to 24 hours with exponential backoff. You can configure a dead-letter queue to capture events that failed after all retries.
6. Remediation Action Execution
The target executes the remediation logic. For example, a Lambda function receives the event, extracts the instance ID, and calls the AWS API to stop or terminate the instance, or to launch a new one. The function should be idempotent to handle retries. It can also log the action to CloudWatch Logs. After execution, the function returns a response to EventBridge. If the function fails, EventBridge retries. The entire process from event emission to remediation typically completes within a few seconds.
Enterprise Scenario 1: Automated Response to EC2 Instance Health Events
A large e-commerce company runs thousands of EC2 instances across multiple Availability Zones. They receive AWS Health events when an instance is scheduled for retirement or has an underlying hardware issue. They use EventBridge to automatically launch a replacement instance and attach the existing EBS volumes. The rule is configured on the default event bus with a pattern matching source: aws.health and detail-type: AWS Health Event. The target is a Lambda function that checks the event details, identifies the affected instance, creates an AMI if needed, launches a new instance in the same subnet and security group, and updates the Route 53 DNS record. The Lambda function is idempotent—if retried, it checks if the replacement already exists. The company also configures a dead-letter queue to capture any events that fail after retries, and a CloudWatch alarm on the DLQ for monitoring. Performance considerations: at peak, they might receive dozens of health events per hour, well within EventBridge limits. A common misconfiguration is forgetting to grant EventBridge permission to invoke the Lambda function, causing silent failures.
Enterprise Scenario 2: Security Compliance Remediation for Security Groups
A financial services company must ensure that no unauthorized security group rules are created. They use AWS CloudTrail to log API calls, and EventBridge to react to AuthorizeSecurityGroupIngress events. The rule pattern matches source: aws.ec2 and detail-type: AWS API Call via CloudTrail. The target is a Lambda function that inspects the new rule; if it violates company policy (e.g., allows SSH from 0.0.0.0/0), the function immediately revokes the rule using revoke_security_group_ingress. The Lambda function also sends an alert to the security team via SNS. The company archives all events for 90 days for audit purposes. A challenge is that CloudTrail events can be delayed by up to 15 minutes, so remediation is not instantaneous. They also need to handle cases where the rule was created by a privileged user—the Lambda function checks the IAM user and only reverts if not authorized. Misconfiguration: if the event pattern does not match the exact CloudTrail event format, the rule never triggers.
Enterprise Scenario 3: Auto Scaling Group Launch Failure Remediation
A gaming company uses Auto Scaling Groups (ASGs) to handle variable traffic. Occasionally, an ASG fails to launch an instance due to insufficient capacity or a bad AMI. They use EventBridge to detect EC2 Auto Scaling Instance Launch Failure events. The rule triggers a Step Functions state machine that analyzes the failure reason, adjusts the launch configuration (e.g., different instance type), and retries the launch. The state machine includes a wait step to avoid immediate retry, and a human approval step for certain failure types. They also archive events for replay during post-mortems. A common issue is that the Step Functions execution can exceed the 5-minute timeout for EventBridge target invocation, so they set the target to asynchronous invocation. Performance: they handle hundreds of events per day during peak launches.
What SOA-C02 Tests on EventBridge
The exam tests your ability to configure EventBridge rules for automated remediation, especially in response to AWS Health events, CloudTrail API calls, and resource state changes. Objective 1.2 specifically includes 'Implement automated remediation based on monitoring events'. You must know how to create rules, set event patterns, choose appropriate targets (Lambda, Step Functions, SQS, SNS), and configure permissions. Expect scenario-based questions where you need to select the correct event pattern or target.
Common Wrong Answers and Why
Using CloudWatch Events instead of EventBridge: The exam focuses on EventBridge as the modern service. CloudWatch Events is still supported but not the recommended answer for new solutions. Candidates often choose CloudWatch Events out of habit.
Choosing SNS as a target for direct remediation: SNS is for notifications, not remediation. Unless the question asks for alerting, Lambda or Step Functions is correct for taking action.
Forgetting to configure resource-based policy for Lambda: EventBridge needs permission to invoke the Lambda function. Candidates may forget this step and wonder why the rule doesn't work.
Misunderstanding event pattern syntax: Common mistake: using "state": "stopped" instead of "state": ["stopped"]. Patterns require arrays for values.
Choosing EC2 as a direct target: EventBridge cannot directly invoke EC2 actions. Candidates might think they can specify an EC2 instance ARN as a target, but that is invalid.
Specific Numbers and Terms on the Exam
Maximum event size: 256 KB
Default archive retention: 14 days (max 14 days)
Maximum retry age: 24 hours (default)
Maximum rules per bus: 300
Maximum targets per rule: 5
EventBridge is the preferred service over CloudWatch Events.
Input transformers can reshape events.
Dead-letter queues are used for failed deliveries.
Edge Cases and Exceptions
Cross-account events: You can send events to another account's event bus, but the rule must be in the receiving account.
Cross-region events: Not natively supported; you need to use a custom event bus in the target region and send events via API.
Event replay: Can only replay events that were archived. Archiving must be enabled before the events occur.
EventBridge Pipes: Another service for point-to-point integrations, but less flexible than rules. Not commonly tested.
How to Eliminate Wrong Answers
If the question involves reacting to an AWS service event (e.g., EC2 state change, Health event), EventBridge is the correct service.
If the action requires computation (e.g., stopping an instance), the target must be Lambda or Step Functions, not SNS or SQS.
If the question mentions 'near real-time', EventBridge is appropriate; for batch processing, consider S3 Events or SQS.
Pay attention to the event pattern syntax: values must be in arrays, and detail fields must match exactly.
EventBridge is the primary service for event-driven automation on AWS, replacing CloudWatch Events.
Rules use event patterns to filter events; patterns require arrays for values.
Common targets for remediation: Lambda functions, Step Functions state machines, SSM Automation documents.
EventBridge cannot directly invoke EC2 actions; use Lambda as an intermediary.
Default retry policy retries for up to 24 hours with exponential backoff.
Maximum event size is 256 KB; larger events are rejected.
Archiving must be enabled to replay events; default retention is 14 days.
Cross-region event delivery requires custom event bus in target region.
Dead-letter queues capture events that fail after all retries.
EventBridge supports input transformers to reshape events before delivery.
These come up on the exam all the time. Here's how to tell them apart.
Amazon EventBridge
Event-driven, decoupled architecture
Supports multiple event sources (AWS, custom, SaaS)
Built-in filtering with event patterns
Automatic retry and DLQ support
No code needed for routing
AWS Lambda (Direct Invocation)
Direct invocation from applications or services
Requires code to poll or receive events
No built-in filtering; must be done in code
Manual retry logic needed
Tight coupling between producer and consumer
Mistake
EventBridge can directly invoke EC2 actions like stopping an instance.
Correct
EventBridge cannot directly call EC2 APIs. It must invoke a Lambda function or Step Functions state machine that contains the logic to stop the instance.
Mistake
EventBridge rules are evaluated in order, and only the first matching rule delivers the event.
Correct
All matching rules are evaluated independently and concurrently. Each matching rule delivers the event to its targets; there is no priority or exclusivity.
Mistake
EventBridge archives all events by default.
Correct
Archiving must be explicitly enabled on an event bus. Without archiving, events are not stored and cannot be replayed.
Mistake
EventBridge can deliver events to targets in any region.
Correct
Targets must be in the same region as the event bus. Cross-region delivery requires sending events to a custom event bus in the target region.
Mistake
EventBridge patterns can use regular expressions.
Correct
EventBridge supports only prefix, suffix, numeric, and exact matching. It does not support regex. For complex matching, use a Lambda function as a filter.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
First, create a Lambda function with appropriate permissions. Then create an EventBridge rule with an event pattern that matches EC2 instance state-change notifications where the state is 'stopped'. For example: `{"source":["aws.ec2"],"detail-type":["EC2 Instance State-change Notification"],"detail":{"state":["stopped"]}}`. Add the Lambda function as a target. Ensure the Lambda resource policy allows EventBridge to invoke it.
Yes, but the SQS queue must have a resource-based policy that allows EventBridge from the source account to send messages. The event bus and rule are in the source account; the target is the SQS queue ARN in the target account.
EventBridge is the evolution of CloudWatch Events. It offers additional features like schema registry, event archiving and replay, and integration with SaaS partners. CloudWatch Events is still available but EventBridge is the recommended service for new implementations.
Configure a dead-letter queue (DLQ) on the target. If EventBridge exhausts its retry policy, it sends the event to the DLQ (SQS queue or SNS topic). You can then process failed events from the DLQ.
Yes. Use CloudTrail to log S3 bucket policy changes, then create an EventBridge rule that matches the PutBucketPolicy API call. The target Lambda function can analyze the new policy and revert if it grants public access.
300 rules per event bus. This is a soft limit that can be increased by requesting a quota increase.
You can use the AWS CLI or SDK to put a test event to the event bus. For example: `aws events put-events --entries file://test-event.json`. The event must match the rule pattern to trigger the target.
You've just covered EventBridge for Automated Remediation — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?