A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. During a recent traffic spike, the application became unavailable for 10 minutes. Analysis shows that the ALB's healthy host count dropped to zero because the instances failed health checks due to high CPU load. What is the MOST effective design change to improve resilience during future traffic spikes?
Trap 1: Increase the instance size to handle higher load.
Larger instances still have a fixed capacity and may still become overwhelmed; also cost-inefficient.
Trap 2: Configure step scaling policies based on CPU utilization.
Step scaling reacts after the threshold is breached, which may be too slow to prevent downtime.
Trap 3: Set a higher CPU threshold for health checks.
Raising the threshold only masks the problem; instances may still become unresponsive.
- A
Use predictive scaling with a scheduled scaling policy for known peak times.
Predictive scaling anticipates demand and scales out in advance, preventing overload.
- B
Increase the instance size to handle higher load.
Why wrong: Larger instances still have a fixed capacity and may still become overwhelmed; also cost-inefficient.
- C
Configure step scaling policies based on CPU utilization.
Why wrong: Step scaling reacts after the threshold is breached, which may be too slow to prevent downtime.
- D
Set a higher CPU threshold for health checks.
Why wrong: Raising the threshold only masks the problem; instances may still become unresponsive.