A company runs SAP HANA on AWS with a multi-AZ deployment using HANA System Replication (HSR). The primary site is in us-east-1a and the secondary in us-east-1b. Each site has an ASCS and PAS. The HANA database uses a virtual IP address managed by a Route 53 health check with a failover routing policy. During a recent AZ failure in us-east-1a, the automatic failover to the secondary site took over 15 minutes. The recovery time objective (RTO) is 5 minutes. Analysis shows that the Route 53 health check failed but the failover did not trigger quickly because the DNS TTL was set to 300 seconds. What changes should be made to meet the RTO?
Lower TTL speeds up DNS propagation; weighted routing allows immediate failover.
Why this answer
Option C is correct because reducing the DNS TTL to 60 seconds ensures that DNS resolvers cache the failover record for a shorter duration, allowing the Route 53 failover routing policy to propagate the new IP address more quickly after a health check failure. Combined with a weighted routing policy and health checks, this enables failover within the 5-minute RTO by minimizing DNS propagation delay, which was the bottleneck at 300 seconds.
Exam trap
The trap here is that candidates may think increasing TTL improves stability (Option A) or that an ALB can replace a virtual IP for HANA HSR (Option B), but the core issue is DNS propagation delay, and only reducing TTL with a failover routing policy directly addresses the RTO requirement.
How to eliminate wrong answers
Option A is wrong because increasing the DNS TTL to 600 seconds would worsen the failover time, extending the delay beyond 15 minutes and making it impossible to meet the 5-minute RTO. Option B is wrong because an Application Load Balancer (ALB) does not support virtual IP addresses for SAP HANA HSR; ALBs operate at Layer 7 and cannot handle the static IP requirements of HANA System Replication, which relies on a fixed virtual IP for client connections. Option D is wrong because removing the health check eliminates the automated failure detection mechanism, and using a simple routing policy without health checks would not trigger failover at all, leaving the system unable to recover from an AZ failure.