An Auto Scaling group behind an Application Load Balancer frequently replaces new EC2 instances. The application needs ~6 minutes to warm up after instance launch. However, the ALB target group health checks start immediately and mark the targets unhealthy until the application is ready. Because the targets become unhealthy early, the Auto Scaling group then terminates the instances and launches replacements, creating a repeated unhealthy/termination loop. What configuration change will most directly improve recovery by preventing premature ASG termination while the application is warming up?
A health check grace period delays when the Auto Scaling group starts evaluating instance health. This prevents the ASG from terminating instances due to ALB/target health being unhealthy during the initial warm-up window, breaking the unhealthy/termination loop.
Why this answer
The health check grace period on an Auto Scaling group (ASG) allows a newly launched EC2 instance to bypass health check failures for a specified duration. By setting this grace period to exceed the application's ~6-minute warm-up time, the ASG will not prematurely terminate the instance based on ALB health check results. This directly breaks the unhealthy/termination loop while the application initializes.
Exam trap
The trap here is that candidates may think disabling health checks or changing the health check type is a valid fix, but the correct solution is to use the ASG's built-in grace period to decouple early health check failures from termination decisions.
How to eliminate wrong answers
Option B is wrong because increasing the desired capacity does not address the root cause of premature termination; it only adds more instances that will also be terminated during the warm-up period. Option C is wrong because disabling ALB health checks would prevent the ALB from routing traffic to healthy instances, defeating the purpose of load balancing and potentially causing service disruption. Option D is wrong because changing the health check type to EC2 would ignore ALB health check failures, but the ASG would still rely on EC2 status checks (which pass immediately at launch), so the loop would stop—however, this is less direct than a grace period and does not ensure the application is actually ready to serve traffic, making it a suboptimal solution compared to the grace period.