SAA-C03Chapter 17 of 189Objective 2.1

Auto Scaling Groups

This chapter covers AWS Auto Scaling Groups (ASGs), a core component of resilient and scalable architectures on AWS. For the SAA-C03 exam, ASGs appear in roughly 15-20% of questions, often integrated with Elastic Load Balancers, launch templates, and CloudWatch alarms. Mastering ASGs is essential for designing cost-optimized, highly available applications that automatically adjust capacity to demand.

25 min read
Intermediate
Updated May 31, 2026

Auto Scaling as a Self-Adjusting Fleet of Delivery Vans

Imagine you run a package delivery company. You have a dispatcher (Auto Scaling Group) that monitors incoming orders (traffic). Your fleet of delivery vans (EC2 instances) is managed by this dispatcher. You set a minimum fleet size of 5 vans to handle baseline daily orders, a maximum of 50 vans for peak holiday season, and a desired capacity of 10 vans for typical operation. The dispatcher uses a scaling policy: if the average package queue length exceeds 20 per van for 5 consecutive minutes (CloudWatch alarm on CPU utilization or request count), the dispatcher orders a new van from the garage (Launch Template) and assigns it a route (registers it to the Load Balancer). The new van takes about 2 minutes to get ready (warm-up time) before it can pick up packages. Conversely, if queue length drops below 5 per van for 10 minutes, the dispatcher sends one van back to the garage (terminates instance), but only after ensuring the van has completed its current deliveries (instance protection from scale-in) and has no critical packages (lifecycle hooks for graceful shutdown). The dispatcher also performs health checks: if a van reports a flat tire (instance status check fails), the dispatcher immediately sends it to the garage and orders a replacement. This fleet management ensures you never have too many idle vans wasting fuel or too few vans causing unhappy customers.

How It Actually Works

What is an Auto Scaling Group?

An Auto Scaling Group (ASG) is a logical grouping of Amazon EC2 instances that automatically adjusts the number of instances in response to changing demand, health status, or a schedule. It is a fundamental building block for scalable, fault-tolerant applications. The ASG manages the lifecycle of instances: it launches new instances when demand increases or when an existing instance becomes unhealthy, and it terminates instances when demand decreases. This ensures you maintain a desired number of healthy instances, optimizing cost and performance.

Core Components

An ASG is defined by three main configuration elements:

Launch Template (or Launch Configuration): Specifies the instance configuration, including AMI, instance type, key pair, security groups, and user data. AWS now recommends launch templates over launch configurations because they support newer features like T2/T3 unlimited, burstable performance, and multiple instance types via mixed instances policy.

Scaling Policies: Define when and how the ASG should scale. Types include simple scaling, step scaling, target tracking scaling, and scheduled scaling.

Health Checks: Determine instance health. Options include EC2 status checks (default), ELB health checks, and custom health checks.

How It Works Internally

When you create an ASG, you specify: - Min Size: The minimum number of instances running at all times. Default is 0. - Max Size: The maximum number of instances the ASG can launch. Default is 0 if unspecified, but you must set it to a positive integer. - Desired Capacity: The initial number of instances the ASG launches immediately after creation. If not specified, it defaults to Min Size.

The ASG constantly monitors the state of its instances. It performs health checks at a configurable interval (default every 5 minutes for EC2 status checks, but can be more frequent with ELB health checks). If an instance is marked unhealthy, the ASG terminates it and launches a new one to replace it, maintaining the desired capacity.

Scaling Mechanisms

#### 1. Dynamic Scaling - Target Tracking Scaling: You select a metric (e.g., average CPU utilization, request count per target) and a target value. The ASG automatically creates the required CloudWatch alarms and adjusts capacity to keep the metric close to the target. This is the simplest and most recommended method. - Step Scaling: You define CloudWatch alarms that trigger when a metric crosses a threshold. You then specify how many instances to add or remove (or a percent) based on the size of the breach. Step scaling allows more granular control than simple scaling. - Simple Scaling: You define a single scaling adjustment when an alarm triggers. After the adjustment, there is a cooldown period (default 300 seconds) during which the ASG does not respond to additional alarms. This is less responsive and can lead to oscillations.

#### 2. Scheduled Scaling You can schedule scaling actions for predictable traffic patterns, such as increasing capacity at 8 AM every weekday and decreasing at 6 PM. Time is in UTC.

#### 3. Predictive Scaling (using AWS Auto Scaling) AWS Auto Scaling can use machine learning to predict future traffic and proactively scale. This is more advanced and not heavily tested on SAA-C03.

Cooldown Periods

Cooldown periods prevent the ASG from launching or terminating instances too rapidly. The default cooldown is 300 seconds. For step scaling, you can set a specific cooldown per step adjustment. During cooldown, the ASG will not respond to additional alarms. The cooldown timer starts after the scaling activity completes (i.e., after the new instance is in service or the old instance is terminated).

Health Check Grace Period

When a new instance is launched, the ASG waits for a grace period (default 300 seconds) before checking its health. This gives the instance time to initialize and pass health checks. If the grace period is too short, the instance may be terminated prematurely. If too long, unhealthy instances might remain in service.

Lifecycle Hooks

Lifecycle hooks allow you to perform custom actions before an instance is launched or terminated. For example, you can run a script to download logs before termination. Hooks put the instance in a pending:wait or terminating:wait state until you either complete the action (via complete-lifecycle-action) or a timeout occurs (default 3600 seconds, max 48 hours).

Integration with Elastic Load Balancer

An ASG can be associated with an Application Load Balancer (ALB), Network Load Balancer (NLB), or Classic Load Balancer (CLB). The ASG automatically registers new instances with the load balancer and deregisters instances that are terminating. This allows the load balancer to distribute traffic only to healthy instances.

Instance Termination Policy

When scaling in, the ASG must decide which instances to terminate first. The default termination policy is: 1. Determine the Availability Zone with the most instances. 2. If multiple zones are tied, choose the one with instances running the oldest launch configuration. 3. Within that zone, select the instance closest to the next billing hour (to minimize cost).

You can also use custom termination policies, such as OldestInstance, NewestInstance, OldestLaunchConfiguration, ClosestToNextInstanceHour, or Default.

Mixed Instances Policy

You can configure an ASG to launch multiple instance types (e.g., t3.micro and t3.small) across multiple purchase options (On-Demand and Spot). This is useful for optimizing cost and capacity. The ASG will distribute instances according to the percentages you specify (e.g., 50% On-Demand, 50% Spot).

Commands and Verification

To create an ASG using AWS CLI:

aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-asg --launch-template LaunchTemplateName=my-template --min-size 1 --max-size 5 --desired-capacity 2 --vpc-zone-identifier subnet-abc,subnet-def

To describe scaling activities:

aws autoscaling describe-scaling-activities --auto-scaling-group-name my-asg

To update an ASG:

aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg --min-size 2 --max-size 10

To view CloudWatch metrics for the ASG (e.g., GroupInServiceInstances):

aws cloudwatch get-metric-statistics --namespace AWS/AutoScaling --metric-name GroupInServiceInstances --dimensions Name=AutoScalingGroupName,Value=my-asg --start-time 2023-01-01T00:00:00Z --end-time 2023-01-02T00:00:00Z --period 300 --statistics Average

Interaction with Related Services

CloudWatch: ASG relies on CloudWatch alarms for scaling policies. Metrics like CPUUtilization, RequestCountPerTarget, and custom metrics drive scaling decisions.

Elastic Load Balancing: ASG registers/deregisters instances with the load balancer. Health checks from the load balancer can be used by the ASG.

EC2 Auto Scaling Lifecycle Hooks: Integrate with SNS, SQS, or Lambda to perform custom actions.

Amazon EventBridge: Can trigger scheduled scaling or respond to events.

AWS Systems Manager: Used for patching and configuration management of instances within the ASG.

Important Defaults and Limits

Default cooldown: 300 seconds

Default health check grace period: 300 seconds

Default health check type: EC2 (status checks only)

Max instance lifetime: 0 (no limit) unless set via instance refresh

ASGs can span multiple Availability Zones (AZs) within a region.

You cannot span multiple regions.

An ASG can have up to 50 scaling policies.

Default termination policy is Default.

The minimum and maximum sizes are per-ASG, not per-AZ.

Step-by-Step: How an ASG Responds to Increased Load

1.

CloudWatch alarm triggers (e.g., CPU > 70% for 5 minutes).

2.

ASG receives the alarm notification.

3.

ASG evaluates scaling policy (e.g., add 2 instances).

4.

ASG checks if adding 2 instances would exceed Max Size. If not, it proceeds.

5.

ASG launches new instances using the launch template.

6.

New instances go through pending:wait state if lifecycle hooks are configured.

7.

Instances become pending:proceed, then in-service.

8.

ASG registers instances with the load balancer (if configured).

9.

Health check grace period starts (default 300 seconds).

10.

After grace period, health checks begin.

11.

Cooldown period starts (default 300 seconds) to prevent further scaling actions.

Common Pitfalls

Insufficient Min/Max: Setting Min Size to 0 may cause all instances to terminate if there is no traffic, leading to a cold start.

Cooldown Too Short: Can cause oscillation (scale up, then immediately scale down).

Health Check Grace Period Too Short: New instances may be terminated before they finish bootstrapping.

Termination Policy Misunderstanding: The default policy does not always terminate the oldest instance; it uses a multi-step selection.

Not Using Lifecycle Hooks: Abrupt termination can lose data or cause in-flight requests to fail.

Walk-Through

1

Define ASG Configuration

You specify the launch template (AMI, instance type, security groups), network (VPC and subnets across at least two AZs), and scaling limits (min, max, desired). The ASG immediately launches the desired capacity number of instances. These instances are spread across the specified AZs for high availability. The ASG uses the subnets to determine where to launch instances. If you only specify one AZ, the ASG will still work but is not fault-tolerant.

2

Monitor Health and Metrics

The ASG periodically checks the health of each instance using EC2 status checks (default every 5 minutes) or ELB health checks (if attached). Additionally, CloudWatch alarms track metrics like CPU utilization. The ASG itself emits metrics (e.g., GroupInServiceInstances, GroupTotalInstances) into the AWS/AutoScaling namespace. These metrics are used by scaling policies.

3

Scale Out Based on Policy

When a CloudWatch alarm triggers (e.g., CPU > 70% for 3 consecutive periods), the ASG evaluates the scaling policy. For target tracking, it calculates the required number of instances to keep the metric at the target. For step scaling, it adds the specified number of instances. The ASG then checks that the new total does not exceed Max Size. If allowed, it launches new instances using the launch template. The new instances go through a pending state and are registered with any attached load balancer.

4

Scale In Based on Policy

When a CloudWatch alarm triggers for low utilization (e.g., CPU < 30% for 10 minutes), the ASG evaluates the scaling policy. It determines how many instances to terminate, ensuring the new count is not below Min Size. The ASG selects instances to terminate using the termination policy (default: AZ with most instances, then closest to billing hour). It then begins the termination process. If lifecycle hooks are configured, the instance enters a terminating:wait state. After any hooks complete, the instance is deregistered from the load balancer and terminated.

5

Replace Unhealthy Instance

If an instance fails health checks (e.g., EC2 status check fails or ELB returns 5xx), the ASG marks it as unhealthy. It immediately terminates the unhealthy instance and launches a new one to maintain desired capacity. The ASG does not wait for a cooldown to replace unhealthy instances. This ensures that the application remains available. The new instance goes through the same startup process as a scale-out event.

6

Perform Instance Refresh

An instance refresh is used to update all instances in the ASG to a new launch template (e.g., new AMI). You define the minimum healthy percentage (e.g., 90%) and the warm-up time. The ASG terminates old instances one by one (or in batches) and launches new ones. It waits for the new instance to pass health checks before proceeding. If the refresh fails, it can roll back to the previous configuration. This is useful for rolling out patches without downtime.

What This Looks Like on the Job

Scenario 1: E-Commerce Web Application with Variable Traffic

A large e-commerce platform experiences high traffic during Black Friday and low traffic during off-peak hours. The architecture uses an ASG with a target tracking scaling policy based on average CPU utilization at 60%. The launch template uses a custom AMI with the application pre-installed. The ASG spans three AZs for resilience. During Black Friday, traffic spikes cause CPU to exceed 60% for 5 minutes, triggering a scale-out event that adds 10 instances. The new instances are automatically registered with an ALB. The cooldown period is set to 120 seconds to allow rapid scaling. Without proper cooldown, the ASG could overshoot and launch too many instances. The team also uses scheduled scaling to pre-warm capacity an hour before the event. A common misconfiguration is setting the health check grace period too short (e.g., 30 seconds), causing new instances to be terminated before the application fully starts.

Scenario 2: Microservices Backend with Spot Instances

A data processing company runs a batch job that processes large datasets. They use an ASG with a mixed instances policy: 30% On-Demand and 70% Spot Instances. The Spot instances are of different types (e.g., c5.large, m5.large) to increase capacity availability. The ASG uses a step scaling policy based on the number of messages in an SQS queue. When the queue depth exceeds 1000, the ASG adds 5 instances. When it drops below 100, it removes 2 instances. The team uses lifecycle hooks to gracefully shut down instances: when a Spot termination notice is received (2 minutes warning), the instance stops processing new jobs and completes current ones. Without this, jobs could be lost. The ASG also has a termination policy set to OldestInstance to ensure consistent instance ages.

Scenario 3: Stateless Web Tier with Blue/Green Deployment

A SaaS company uses an ASG to manage the web tier of their application. They perform blue/green deployments by creating a new ASG with the new launch template (green) and directing traffic to it using a separate ALB target group. Once the green ASG is healthy, they update the DNS to point to the green ALB. The old ASG (blue) is then scaled down to zero. This approach requires careful coordination of health checks and lifecycle hooks. A common issue is that the old ASG continues to receive traffic if the DNS TTL is too long, causing errors. The team sets the health check grace period to 5 minutes to account for application initialization. They also use CloudWatch alarms to monitor the number of InService instances to ensure the green ASG is fully operational before switching.

How SAA-C03 Actually Tests This

SAA-C03 Exam Focus on Auto Scaling Groups

The SAA-C03 exam tests your ability to design resilient and cost-optimized architectures using ASGs. Key objective codes include: 2.1 (Design resilient architectures), 2.2 (Design high-performing architectures), and 2.3 (Design cost-optimized architectures). You must understand how ASGs integrate with ELB, CloudWatch, and launch templates.

Common Wrong Answers and Traps

1.

"ASG automatically distributes traffic across instances." This is false. ASG does not distribute traffic; it only manages instance counts. Traffic distribution is done by a load balancer. Candidates often confuse ASG with ELB.

2.

"Setting Min Size = Max Size = Desired Capacity prevents scaling." While the ASG will not scale, it still replaces unhealthy instances. The exam may ask about this to test understanding of health checks.

3.

"Cooldown period applies after each scaling activity regardless of type." Actually, cooldown only applies to simple and step scaling policies. Target tracking scaling has its own logic that avoids cooldown. Also, health check replacements ignore cooldown.

4.

"ASG can span multiple regions." False. ASGs are regional and can only span AZs within a single region. Multi-region architectures require separate ASGs.

5.

"Launch Configurations are the recommended way to define instance configurations." AWS now recommends launch templates because they support newer features. Launch configurations are legacy. The exam may test this.

Specific Numbers and Terms

Default cooldown: 300 seconds

Default health check grace period: 300 seconds

Default health check type: EC2

Termination policy default: Default (AZ with most instances, then closest to billing hour)

Lifecycle hook timeout: up to 48 hours (default 3600 seconds)

Maximum instance lifetime: not set by default

Minimum number of AZs for high availability: 2

Target tracking metric: e.g., ASGAverageCPUUtilization, ALBRequestCountPerTarget

Edge Cases

ASG with 0 Min Size and no desired capacity: If traffic drops, all instances can be terminated, leading to a cold start. The exam may test that you should set Min Size >= 1 for production.

Instance refresh with minimum healthy percentage: If set too high (e.g., 100%), the refresh will be slow because only one instance is replaced at a time. If set too low (e.g., 50%), availability may drop.

Spot Instance interruptions: The ASG will automatically replace Spot instances that are reclaimed. You should use lifecycle hooks to handle termination notices.

How to Eliminate Wrong Answers

Focus on the mechanism: ASG is about maintaining instance count, not traffic routing. If a question mentions "distribute traffic," the answer likely involves an ELB. If it mentions "maintain number of instances," look for ASG. Also, remember that ASG does not automatically scale based on time of day unless you configure scheduled scaling. The exam often presents a scenario where you need to choose between dynamic and scheduled scaling.

Key Takeaways

ASG maintains a desired number of EC2 instances, automatically launching and terminating based on scaling policies, health checks, and schedules.

Default cooldown period is 300 seconds; default health check grace period is 300 seconds.

ASG can span multiple Availability Zones but not multiple regions.

Target tracking scaling is the simplest and most recommended dynamic scaling policy.

Mixed instances policy allows using both On-Demand and Spot Instances, and multiple instance types.

Lifecycle hooks allow you to perform custom actions before an instance is launched or terminated.

Default termination policy selects the AZ with the most instances, then the instance closest to the next billing hour.

Launch templates are preferred over launch configurations for new ASGs.

ASG does not distribute traffic; it only manages instance count. Traffic distribution is done by an ELB.

Instance refresh allows rolling updates to a new launch template without downtime.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Simple Scaling

Single adjustment when alarm triggers (e.g., add 1 instance).

Has a cooldown period (default 300 seconds) that prevents further scaling until cooldown expires.

Can cause oscillation if cooldown is too short.

Simpler to configure but less responsive.

Not recommended for production; replaced by step scaling.

Step Scaling

Multiple adjustments based on the size of the alarm breach (e.g., add 1 if CPU > 70%, add 3 if CPU > 90%).

Supports separate cooldowns for each step adjustment.

More responsive and stable.

Allows finer control over scaling behavior.

Recommended over simple scaling.

Launch Configuration

Legacy resource; AWS recommends using launch templates.

Cannot be updated; you must create a new one.

Supports only one instance type.

No support for T2/T3 unlimited or burstable performance.

Limited to basic parameters (AMI, instance type, security groups, user data).

Launch Template

Current recommended resource; supports all new features.

Supports versioning; you can update and create new versions.

Supports multiple instance types via mixed instances policy.

Supports T2/T3 unlimited, CPU options, and advanced networking.

Supports additional parameters like placement tenancy, RAM disk ID, and kernel ID.

EC2 Health Checks

Default health check type for ASG.

Checks EC2 instance status (system status check and instance status check).

No application-level awareness; only checks if the instance is running.

Occurs every 5 minutes (configurable via ASG health check interval).

Cannot detect application errors (e.g., HTTP 500).

ELB Health Checks

Must be explicitly enabled by attaching an ELB to the ASG.

Checks application health via HTTP/HTTPS health checks defined on the target group.

Can detect application-layer issues (e.g., wrong response code).

Frequency is configurable on the load balancer (default 10 seconds).

More granular and application-aware.

Watch Out for These

Mistake

Auto Scaling Groups automatically distribute incoming traffic across instances.

Correct

ASGs do not distribute traffic. They only manage the number of instances. Traffic distribution is handled by an Elastic Load Balancer (ELB) attached to the ASG.

Mistake

Setting Min Size, Max Size, and Desired Capacity to the same value prevents any scaling activity.

Correct

While it prevents scaling based on demand, the ASG still replaces unhealthy instances. It will terminate an unhealthy instance and launch a new one to maintain the desired capacity.

Mistake

The default termination policy always terminates the oldest instance first.

Correct

The default policy first selects the Availability Zone with the most instances. If multiple zones are tied, it chooses the one with the oldest launch configuration. Only then does it select the instance closest to the next billing hour.

Mistake

Cooldown periods apply to all scaling activities, including health check replacements.

Correct

Cooldown periods only apply to scaling actions triggered by alarms (simple and step scaling). Health check replacements and target tracking scaling do not use cooldown periods.

Mistake

ASGs can span multiple AWS regions for disaster recovery.

Correct

ASGs are regional resources. They can span multiple Availability Zones within a single region but not multiple regions. For multi-region architectures, you must create separate ASGs in each region.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between an Auto Scaling Group and a Load Balancer?

An Auto Scaling Group (ASG) manages the number of EC2 instances, launching and terminating them based on demand or health. A Load Balancer (ELB) distributes incoming traffic across those instances. They often work together: the ASG registers new instances with the ELB, and the ELB health checks can be used by the ASG to determine instance health. On the exam, remember that ASG is about capacity, ELB is about traffic distribution.

How do I prevent an ASG from terminating a specific instance during scale-in?

You can enable instance scale-in protection on that instance. This prevents the ASG from terminating it during a scale-in event. However, the instance can still be terminated if it is unhealthy or if you manually terminate it. To enable, use the `--instance-protection` parameter when creating or updating the ASG, or use the CLI command `aws autoscaling set-instance-protection`.

What happens if all instances in an ASG are terminated or become unhealthy?

The ASG will launch new instances to meet the desired capacity (or Min Size if desired capacity is not set). If Min Size is 0, the ASG may have zero instances until a scaling policy triggers. To avoid this, always set Min Size to at least 1 for production workloads.

Can I use an ASG with a Network Load Balancer?

Yes, ASG supports Application Load Balancers, Network Load Balancers, and Classic Load Balancers. When attached, the ASG automatically registers new instances with the target group and deregisters terminating instances. For NLB, health checks are at the target group level.

What is the difference between cooldown and health check grace period?

Cooldown prevents the ASG from launching or terminating additional instances after a scaling activity, to avoid oscillations. Health check grace period gives a new instance time to start up before health checks are performed. Both default to 300 seconds. Cooldown applies to simple/step scaling; grace period applies to all new instances.

How does ASG handle Spot Instance interruptions?

When AWS reclaims a Spot Instance (due to capacity or price), the instance receives a termination notice (2 minutes). The ASG will automatically launch a replacement instance. To handle the interruption gracefully, use lifecycle hooks to save state or complete work before termination.

Can I update the launch template of an existing ASG?

Yes, you can update the launch template of an existing ASG. The change applies to new instances launched after the update, but existing instances continue using the old template. To force all instances to use the new template, perform an instance refresh or manually terminate and let the ASG replace them.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Auto Scaling Groups — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?