A DevOps engineer notices that an EC2 instance in an Auto Scaling group is repeatedly failing health checks and being terminated. The engineer needs to capture the root cause by collecting memory dumps and system logs before termination. What should the engineer do?
EC2Rescue can run diagnostics at startup; extending the grace period gives time for the tool to collect data before termination.
Why this answer
Option B is correct because EC2Rescue is specifically designed to collect memory dumps and system logs from EC2 instances, and by configuring it to run at startup and extending the Auto Scaling health check grace period, the engineer ensures diagnostics are captured before the instance is terminated for failing health checks. This approach directly addresses the need to gather root cause data from a failing instance that is about to be replaced.
Exam trap
The trap here is that candidates often assume Systems Manager Run Command (Option C) can reliably execute scripts on failing instances, but they overlook that the instance must be in a running and reachable state, which is not guaranteed when health checks are repeatedly failing and termination is imminent.
How to eliminate wrong answers
Option A is wrong because the CloudWatch Agent collects logs and metrics during normal operation but does not capture memory dumps or system logs at the point of failure before termination; it cannot guarantee data collection from an instance that is being terminated due to health check failures. Option C is wrong because AWS Systems Manager Run Command requires the instance to be running and reachable to execute commands, but the instance is repeatedly failing health checks and may be terminated before the command can run, making it unreliable for capturing pre-termination diagnostics. Option D is wrong because EC2 instance metadata service (IMDS) provides metadata about the instance (e.g., instance ID, AMI ID) but does not capture diagnostic data like memory dumps or system logs, and it does not persist after termination.