This chapter covers CloudWatch Composite Alarms and Anomaly Detection, two advanced alarm features that reduce noise and catch complex issues. For the SOA-C02 exam, these topics appear in roughly 5-8% of questions within Domain 1: Monitoring and Reporting. Understanding how to create composite alarms that combine multiple conditions, and how to configure anomaly detection bands, is critical for passing the exam and for real-world cloud monitoring. This chapter will explain the underlying mechanisms, configuration steps, common pitfalls, and exam-specific traps.
Jump to a section
Imagine a large office building with multiple fire alarms. Each alarm is triggered by a specific sensor: smoke detector, heat detector, or manual pull station. A simple alarm sounds if any single sensor triggers. But this leads to false alarms—someone burning toast sets off the smoke detector, or a heat wave triggers the heat detector. To reduce false alarms, the building installs a "composite" fire alarm system. This system requires multiple sensors to trigger within a certain time window before sounding the building-wide alarm. For example, if the smoke detector and heat detector both trigger within 5 minutes, the composite alarm sounds. If only one triggers, it logs the event but does not alert everyone. Additionally, the system uses anomaly detection: it learns the typical temperature and smoke levels over time. If a sensor reading deviates significantly from the learned baseline—say, a sudden 20-degree temperature spike—it flags that as anomalous and may trigger a pre-alarm. This mirrors CloudWatch Composite Alarms, which combine multiple alarms into a single alarm using AND/OR logic. Anomaly Detection creates a model of expected metric behavior based on historical data, and generates alarms when metrics deviate beyond a threshold (like 2 standard deviations). Just as the smart fire alarm reduces false positives, Composite Alarms reduce noise by requiring multiple conditions, and Anomaly Detection catches subtle issues that static thresholds would miss.
What Are CloudWatch Composite Alarms?
A standard CloudWatch alarm monitors a single metric and transitions to ALARM state when a threshold is breached. However, many operational scenarios require multiple conditions to be true before taking action. For example, you might want to alert only if both CPU utilization is high AND error rate is high, to avoid paging on transient CPU spikes. Composite alarms allow you to define a rule expression that evaluates the state of multiple child alarms (which can be metric alarms or other composite alarms) using boolean operators. The composite alarm itself transitions to ALARM only when the rule evaluates to true.
How Composite Alarms Work Internally
Composite alarms are evaluated continuously. Each child alarm has its own state (OK, ALARM, INSUFFICIENT_DATA). The composite alarm's rule expression is evaluated each time any child alarm changes state. The expression uses a simple JSON-like syntax:
ALARM("child-alarm-name") OR OK("other-alarm") AND NOT INSUFFICIENT_DATA("third-alarm")
Available functions: ALARM(), OK(), INSUFFICIENT_DATA(). Boolean operators: AND, OR, NOT. Parentheses control precedence. The composite alarm's state is determined by the rule: if the rule evaluates to true, the composite alarm is in ALARM state; otherwise it is OK. If any child alarm has INSUFFICIENT_DATA and the rule cannot be evaluated (e.g., missing data for a required condition), the composite alarm may also go to INSUFFICIENT_DATA.
Key Components and Limits
Child alarms must be in the same AWS account and Region.
A composite alarm can reference up to 10 child alarms (soft limit, can be increased).
Composite alarms cannot reference alarms that have a period shorter than 60 seconds (high-resolution alarms are allowed but with minimum 60-second period).
Composite alarms do not have their own metric; they only aggregate child alarm states.
Composite alarms can be used as the target for actions (e.g., SNS, Auto Scaling, EC2 action) just like metric alarms.
Anomaly Detection Overview
Anomaly Detection uses machine learning to model the expected range of a metric based on historical data. CloudWatch continuously learns patterns (daily, weekly, seasonal) and creates a band of expected values. You can then create an alarm that triggers when the metric is outside the band (either above the upper threshold or below the lower threshold). This is especially useful for metrics with dynamic baselines, like request latency or traffic volume, where static thresholds are impractical.
How Anomaly Detection Works Internally
CloudWatch uses a statistical model (based on Seasonal Exponential Smoothing) to forecast metric values. The model considers:
Time of day
Day of week
Trend
Seasonal patterns (e.g., weekly, monthly)
The model outputs an expected value and a confidence band. By default, the band is set to 2 standard deviations (95% confidence). You can adjust the band width (number of standard deviations). The model requires at least 2 weeks of historical data to train. After that, it updates continuously. The anomaly detection model is specific to a metric and its dimensions. You can create multiple models for different dimensions.
Configuration Steps
To create a composite alarm:
1. Create the child alarms first.
2. In CloudWatch Console, go to Alarms > All alarms > Create composite alarm.
3. Enter a name and description.
4. Write the rule expression using the child alarm names (e.g., ALARM("HighCPU") AND ALARM("HighErrorRate")).
5. Configure actions (SNS topic, Auto Scaling, etc.).
6. Create.
To create an anomaly detection alarm: 1. In CloudWatch Console, go to Alarms > All alarms > Create alarm. 2. Select a metric. 3. In the "Conditions" section, choose "Anomaly detection". 4. Adjust the band width (e.g., 2, 3 standard deviations). 5. Choose whether to alarm when metric is above, below, or outside the band. 6. Configure actions and create.
Interaction with Other Services
AWS Lambda: Composite alarms can trigger Lambda functions via SNS for complex remediation.
AWS Auto Scaling: You can set a composite alarm to trigger Auto Scaling actions, e.g., scale out only when both CPU and memory are high.
Amazon EventBridge: Composite alarms generate events when they change state, which can be used to trigger workflows.
AWS Systems Manager OpsCenter: Composite alarms can create OpsItems for multi-condition issues.
Best Practices
Use composite alarms to reduce alert fatigue: combine multiple low-severity alarms into a single high-severity alarm.
For anomaly detection, ensure you have at least 2 weeks of data; otherwise the model will not be accurate.
Test composite alarm expressions by manually changing child alarm states (you can use the set-alarm-state CLI command).
Monitor composite alarm state changes via CloudWatch Events to track root causes.
CLI and SDK Examples
Create a composite alarm using AWS CLI:
aws cloudwatch put-composite-alarm \
--alarm-name "HighCPUAndErrors" \
--alarm-rule "ALARM(\"HighCPU\") AND ALARM(\"HighErrors\")" \
--actions-enabled \
--alarm-actions arn:aws:sns:us-east-1:123456789012:MyTopicCreate an anomaly detection alarm:
aws cloudwatch put-metric-alarm \
--alarm-name "AnomalyCPU" \
--alarm-description "Alarm when CPU outside 2 stddev band" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold-metric-id m1 \
--metrics "[{\"Id\":\"m1\",\"ReturnData\":true,\"Expression\":\"ANOMALY_DETECTION_BAND(m2, 2)\"},{\"Id\":\"m2\",\"ReturnData\":true,\"MetricStat\":{\"Metric\":{\"Namespace\":\"AWS/EC2\",\"MetricName\":\"CPUUtilization\"},\"Period\":300,\"Stat\":\"Average\"}}]"Note: The ANOMALY_DETECTION_BAND function is used in math expressions. The second parameter is the number of standard deviations.
Common Mistakes
Using the same alarm in both ALARM() and OK() conditions can cause logic errors.
Forgetting that composite alarms do not have their own period; they rely on child alarm evaluation frequencies.
Anomaly detection models can be skewed if the historical data includes anomalous events (e.g., a past outage). You can exclude specific time ranges from training.
Exam Relevance
SOA-C02 tests your ability to:
Differentiate between composite and metric alarms.
Write correct composite alarm expressions.
Understand anomaly detection band width and training data requirements.
Know that composite alarms can reference other composite alarms (up to 10 total).
Identify when to use composite vs. metric alarms.
Create Child Metric Alarms
First, define the individual metric alarms that will be combined. For example, create an alarm for high CPU utilization (e.g., CPU > 80% for 5 minutes) and another for high error count. Ensure each alarm has appropriate period, evaluation periods, and thresholds. These alarms will be the building blocks for the composite alarm. Use the CloudWatch console, CLI, or SDK. Each child alarm must be in the same account and region as the composite alarm.
Define Composite Alarm Rule
Write a boolean expression using ALARM(), OK(), and INSUFFICIENT_DATA() functions. For example: ALARM("HighCPU") AND ALARM("HighErrors"). The expression determines when the composite alarm enters ALARM state. You can use parentheses for grouping. The composite alarm evaluates the rule each time any child alarm changes state. The rule can be up to 10 child alarms total.
Configure Actions and Notifications
Specify actions for the composite alarm, such as sending a notification to an SNS topic, triggering Auto Scaling, or executing a Lambda function. Actions are executed when the composite alarm transitions to ALARM or OK state. You can also configure OpsItems in Systems Manager. Note that actions are not inherited from child alarms; you must set them on the composite alarm.
Test Composite Alarm Logic
Manually set the state of child alarms using the AWS CLI command aws cloudwatch set-alarm-state to verify the composite alarm transitions correctly. For example, set HighCPU to ALARM and HighErrors to OK, and confirm the composite alarm remains OK. Then set both to ALARM and verify the composite alarm goes to ALARM. This step is crucial to avoid false positives or missed alerts.
Monitor and Refine Anomaly Detection
After creating an anomaly detection alarm, monitor its performance. The model adapts over time; you may need to adjust the band width (number of standard deviations) if too many false alarms occur. You can also exclude specific time periods from training if they contain anomalous data. Use CloudWatch metrics for alarm history to track state changes and fine-tune.
In a production e-commerce platform, the operations team faced alert fatigue due to static threshold alarms. For example, high CPU on a web server triggered alerts even during routine traffic spikes. They implemented a composite alarm: ALARM("HighCPU") AND ALARM("HighLatency") AND NOT ALARM("DeployInProgress"). This reduced false alarms by 80%. The composite alarm triggered only when both CPU and latency were high, and ignored deployments. They also used anomaly detection on request count: the metric normally fluctuates with time of day, so a static threshold would be either too sensitive or too loose. Anomaly detection with 2 standard deviations caught a sudden drop in traffic that indicated a routing issue. The team configured the anomaly detection alarm to send a notification to an SNS topic that triggered a Lambda function to restart the web servers. One common misconfiguration was forgetting to exclude deployment periods from the anomaly detection model, causing the model to learn the deployment spikes as normal behavior, which then missed real anomalies. The team learned to use the 'Exclude' feature in the anomaly detection model configuration to mark time ranges during deployments as training exclusions. Another scenario: a financial services company used composite alarms to combine multiple microservice health checks. They had a composite alarm that required three out of five health check alarms to be in ALARM before paging the on-call engineer, using a rule like (ALARM("svc1") AND ALARM("svc2")) OR (ALARM("svc3") AND ALARM("svc4")) OR ALARM("svc5"). This prevented single-service blips from causing unnecessary escalations. The key performance consideration is that composite alarms do not add latency; they are evaluated instantly when child alarms change. However, if you have many composite alarms referencing the same child alarms, there is a risk of hitting CloudWatch API rate limits. In practice, keep the number of composite alarms moderate and use descriptive naming conventions.
The SOA-C02 exam tests Composite Alarms and Anomaly Detection under Objective 1.1: Monitor and Report Metrics. Expect 2-3 questions on these topics. The most common wrong answers involve: 1. Confusing composite alarms with metric math: Candidates think composite alarms can directly combine metric values using math expressions. They cannot. Composite alarms only combine alarm states using boolean logic. Metric math can be used in metric alarms but not composite alarms. 2. Assuming composite alarms have their own period: They do not. The composite alarm's state depends solely on child alarm states; it has no period or evaluation period of its own. 3. Misunderstanding anomaly detection band width: The default is 2 standard deviations, but the exam may ask about adjusting it. Remember that a larger band width (more standard deviations) results in fewer alarms (less sensitive), and vice versa. 4. Thinking anomaly detection requires at least 1 month of data: The actual requirement is 2 weeks. The exam may test this precise number. 5. Believing composite alarms can be used as child alarms in other composite alarms: They can, but the total number of child alarms (including nested composites) cannot exceed 10. This is a common trick.
Specific values and terms:
Default anomaly detection band: 2 standard deviations.
Minimum training data: 2 weeks.
Maximum child alarms per composite: 10 (soft limit).
Functions: ALARM(), OK(), INSUFFICIENT_DATA().
Composite alarm rule syntax: JSON-like string.
Edge cases:
If a child alarm is deleted, the composite alarm enters INSUFFICIENT_DATA.
If a child alarm is in INSUFFICIENT_DATA, the composite alarm's rule may evaluate to INSUFFICIENT_DATA if the condition depends on that alarm.
Anomaly detection models can be created for metrics with any period, but the model uses the metric's resolution (standard or high-resolution).
How to eliminate wrong answers:
If the question mentions combining metric values, eliminate composite alarm options.
If the question mentions "period" or "evaluation periods" for a composite alarm, that answer is wrong.
For anomaly detection, if the question says "requires 1 month of data", it's wrong.
Always check whether the scenario requires boolean logic (composite) or dynamic thresholds (anomaly detection).
Composite alarms use boolean logic (ALARM, OK, INSUFFICIENT_DATA) to combine up to 10 child alarms.
Composite alarms do not have their own period or evaluation periods; they react to child alarm state changes.
Anomaly Detection uses a machine learning model that requires at least 2 weeks of historical data.
Default anomaly detection band width is 2 standard deviations (95% confidence).
Anomaly detection models continuously learn and adapt to metric patterns.
Composite alarms can be nested (a composite alarm can be a child of another composite), but total child alarms cannot exceed 10.
To test composite alarms, use the set-alarm-state CLI command to manually change child alarm states.
Anomaly detection models can exclude specific time ranges from training to avoid skewing from known anomalies.
These come up on the exam all the time. Here's how to tell them apart.
Composite Alarm
Combines states of multiple alarms using boolean logic (AND/OR/NOT).
Cannot directly compute metric values; only uses alarm states.
No period or evaluation periods; state changes instantly when child alarms change.
Ideal for multi-condition alerting (e.g., high CPU AND high memory).
Can reference up to 10 child alarms.
Metric Alarm (with Metric Math)
Combines multiple metrics using math expressions (e.g., SUM, AVG, IF).
Can compute ratios, sums, or custom formulas from raw metrics.
Has its own period, evaluation periods, and datapoints.
Ideal for calculated metrics (e.g., error rate = errors / total requests).
Can include up to 15 metric math expressions per alarm.
Mistake
Composite alarms can use metric math expressions like SUM or AVG.
Correct
Composite alarms only support boolean logic using ALARM(), OK(), INSUFFICIENT_DATA() functions. To combine metrics mathematically, use metric math in a standard metric alarm.
Mistake
Anomaly detection requires at least 30 days of historical data.
Correct
The requirement is a minimum of 2 weeks (14 days) of data to train the model. Less data may result in inaccurate bands.
Mistake
Composite alarms have their own evaluation period and datapoints.
Correct
Composite alarms do not have a period, evaluation periods, or datapoints. They evaluate their rule whenever any child alarm changes state.
Mistake
Anomaly detection bands are static once created.
Correct
The model continuously learns and updates the band as new data arrives. The band adapts to changes in the metric's pattern.
Mistake
You can use composite alarms to trigger actions on any child alarm state change.
Correct
Composite alarms only trigger actions when the composite alarm itself changes state. To trigger actions on individual child alarms, configure actions directly on those alarms.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
A metric alarm monitors a single metric (or a math expression) and transitions based on a threshold. A composite alarm monitors the states of other alarms (metric or composite) using a boolean rule. Composite alarms do not have their own period or threshold; they aggregate child alarm states. Use composite alarms when you need multiple conditions to be true before alerting.
By default, a composite alarm can reference up to 10 child alarms. This includes both metric alarms and other composite alarms. You can request a limit increase via AWS Support.
If a child alarm is deleted, the composite alarm cannot evaluate the rule and enters the INSUFFICIENT_DATA state. You must update the composite alarm to remove the reference to the deleted alarm.
The model requires at least 2 weeks of data to produce meaningful bands. With less data, the model may still create bands but they will be less accurate. CloudWatch will create the model but may produce wide bands until sufficient data is available.
Yes, composite alarms can be nested. However, the total number of child alarms (including those referenced indirectly through nested composites) must not exceed 10. This limit applies to the top-level composite alarm.
When creating or editing an anomaly detection model, you can specify time ranges to exclude from training. For example, you can exclude a deployment window or a known outage period. This prevents the model from learning those events as normal behavior.
Composite alarms can trigger any action that metric alarms can: SNS notifications, Auto Scaling policies, EC2 actions (stop, terminate, reboot), and Systems Manager OpsItems. Actions are configured on the composite alarm itself, not inherited from child alarms.
You've just covered CloudWatch Composite Alarms and Anomaly Detection — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?