An online retail company runs its e-commerce platform on a virtualized infrastructure with 50 virtual servers. The platform experiences intermittent slowdowns during peak hours, and recent monitoring reports show that disk I/O latency on the storage area network (SAN) frequently exceeds 50 ms during these periods. The SAN has two fabric switches and a single storage array with 12 TB of usable capacity, currently at 80% utilization. The company’s disaster recovery plan requires recovery point objective (RPO) of 1 hour and recovery time objective (RTO) of 4 hours for the e-commerce platform. During a recent test failover to the disaster recovery site, the IT team discovered that the replication link between primary and DR sites is saturated, causing replication lag of up to 3 hours. The team also noted that the DR site storage has only 6 TB of usable capacity, now at 60% utilization. The IT manager is concerned about meeting the RPO and RTO. Which course of action should the IT team take first?
This directly addresses the replication lag, reducing it to meet the 1-hour RPO, and is the most urgent action to ensure disaster recovery objectives.
Why this answer
The immediate issue preventing the organization from meeting its RPO of 1 hour is the saturated replication link, which causes replication lag of up to 3 hours. Upgrading the link to a higher bandwidth connection directly addresses the bottleneck, reducing replication time and enabling the RPO to be met. Other options, while potentially beneficial, do not resolve the primary cause of the RPO failure.
Exam trap
The trap here is that candidates focus on the disk I/O latency or storage utilization issues, which are performance concerns, rather than recognizing that the saturated replication link is the direct cause of the RPO failure and must be addressed first.
How to eliminate wrong answers
Option A is wrong because upgrading the SAN fabric switches addresses disk I/O latency, which is a performance issue, not the replication lag that causes the RPO violation. Option B is wrong because adding storage capacity to the DR site does not reduce replication lag; it may even increase the amount of data that needs to be replicated. Option C is wrong because implementing more frequent incremental backups does not solve the replication link saturation; it could increase the load on the link and worsen the lag, and backups are not the same as synchronous or asynchronous replication used for RPO.