This chapter covers Business Continuity (BC) and Disaster Recovery (DR) — the plans and procedures that ensure an organization can survive and recover from disruptive events. On the N10-009 exam, this topic appears in Domain 3.0 (Network Operations), Objective 3.4, and accounts for approximately 10-15% of the questions. You must understand BC/DR concepts, recovery metrics, site types, backup strategies, and testing procedures. This chapter provides the depth needed to answer every exam question on this objective.
Jump to a section
Think of a company's IT operations as a large office building. Business continuity (BC) is the plan to keep the business running if the building is unusable—like having a backup location where employees can work. Disaster recovery (DR) is the specific process of restoring the building itself after a fire. The building has a fire suppression system (redundant cooling), backup generators (UPS/power redundancy), and off-site storage for critical documents (backups). A fire drill (DR test) ensures everyone knows the evacuation route (RTO) and can set up temporary desks (failover) within 4 hours. The building manager (BC/DR coordinator) maintains the plan, reviews it annually (plan review), and ensures the backup generator is tested monthly. If the fire alarm goes off (disaster), the manager activates the plan: employees evacuate to the alternate site (failover), the IT team restores servers from off-site backups (data restoration), and the building is repaired (recovery). Without the plan, chaos ensues—employees don't know where to go, data is lost, and the business fails. The exam tests your knowledge of each component: RTO/RPO, hot/cold sites, backup types, and testing methods.
What is Business Continuity and Disaster Recovery?
Business Continuity (BC) and Disaster Recovery (DR) are two related but distinct disciplines. BC focuses on maintaining essential business functions during and after a disaster, while DR focuses on restoring IT infrastructure and systems to normal operations. The exam expects you to know the difference and how they work together.
Business Continuity (BC): A proactive plan that ensures critical business processes can continue during a disruption. It includes alternate work sites, communication plans, and procedures for maintaining operations.
Disaster Recovery (DR): A reactive plan that details how to recover IT systems, data, and networks after a disaster. It is a subset of BC.
Why BC/DR Exists
Organizations face threats such as natural disasters, power outages, cyberattacks, hardware failures, and human error. Without BC/DR, a single event can cause prolonged downtime, data loss, and financial ruin. The exam emphasizes that BC/DR is about risk management — balancing the cost of prevention and recovery against the potential impact of a disaster.
Key Metrics: RTO and RPO
The two most important metrics in BC/DR are Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
Recovery Time Objective (RTO): The maximum acceptable time that a system can be unavailable after a disaster. For example, an RTO of 4 hours means the system must be restored within 4 hours. RTO drives the choice of recovery strategy — shorter RTOs require more expensive solutions like hot sites or active-active clusters.
Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time. For example, an RPO of 15 minutes means backups must be taken at least every 15 minutes, so at most 15 minutes of data is lost. RPO determines backup frequency and replication methods.
Both metrics are defined by business requirements, not IT. The exam often presents a scenario and asks you to identify the correct RTO or RPO based on the business needs.
Site Types for Disaster Recovery
Organizations use alternate sites to host operations during a disaster. The exam tests three primary types:
Hot Site: A fully equipped alternate site with all hardware, software, data, and network connectivity ready to take over immediately. RTO is typically minutes to hours. It is the most expensive option.
Warm Site: A partially equipped site with some hardware and software, but data may be outdated. Activation requires restoring data from backups, so RTO is longer — typically hours to days.
Cold Site: A basic facility with power, cooling, and space, but no IT equipment. Everything must be procured and installed after a disaster. RTO is days to weeks. It is the least expensive.
Common exam trap: Confusing warm and cold sites. Remember: warm sites have some equipment ready; cold sites have none.
Backup Strategies
Backups are the foundation of data recovery. The exam covers backup types, media, and locations.
#### Backup Types
Full Backup: Copies all data. Most time-consuming and storage-intensive, but simplest to restore. Typically done weekly.
Incremental Backup: Copies only data that has changed since the last backup (full or incremental). Fast and space-efficient, but restore requires the last full backup plus all subsequent incremental backups in order. This increases restore time and risk of failure.
Differential Backup: Copies all data changed since the last full backup. Restore requires only the last full backup and the latest differential backup. Faster restore than incremental, but each differential grows over time.
Exam tip: Understand the trade-offs. Incremental saves time and space but complicates restore. Differential balances backup speed and restore simplicity.
#### Backup Media
Tape: Sequential access, low cost per GB, but slow and prone to physical damage. Used for long-term archival.
Disk: Random access, faster backup and restore. Common for on-site and off-site backups.
Cloud: Off-site storage with scalability and pay-as-you-go pricing. Requires internet bandwidth.
Virtual Tape Library (VTL): Emulates tape but uses disk hardware. Combines tape management with disk performance.
#### Backup Locations
On-site: Fast access but vulnerable to same disaster that takes down primary site.
Off-site: Protects against site-local disasters. Must have sufficient network connectivity.
Cloud: Off-site by nature. Provides geographic redundancy.
3-2-1 Rule: A best practice: maintain at least three copies of data, on two different media types, with one copy off-site. The exam may test this rule.
High Availability and Redundancy
BC/DR often involves high availability (HA) to minimize downtime. Key HA concepts:
Active-Active: Both sites handle traffic simultaneously. If one fails, the other continues. RTO is near zero.
Active-Passive (Active-Standby): One site is active, the other is on standby. Failover requires switching to the standby site. RTO is minutes to hours.
Failover/Cluster: A group of servers that work together. If one fails, another takes over.
Load Balancing: Distributes traffic across multiple servers to prevent overload and provide redundancy.
Redundant Hardware: Power supplies, NICs, disks (RAID), and cooling.
UPS and Generator: Uninterruptible Power Supply provides short-term battery power; generators provide long-term power.
Testing and Maintenance
A BC/DR plan is useless if never tested. The exam covers testing types:
Tabletop Exercise: A walkthrough of the plan with key stakeholders. No actual failover. Low cost, low risk.
Simulation: A more realistic test that may involve some actual failover but with controlled scope.
Parallel Test: Run the alternate site in parallel with the primary site. Validates functionality without affecting production.
Full Interruption Test: Shut down the primary site and run from the alternate site. Highest fidelity but highest risk and cost.
Regular plan review (annually at minimum) ensures the plan stays current with infrastructure changes.
Cloud and Virtualization Considerations
Modern BC/DR often leverages cloud and virtualization:
Virtualization: Allows quick recovery by restoring virtual machines (VMs) on any compatible hypervisor.
Cloud DR (DRaaS): Disaster Recovery as a Service. Replicates data to a cloud provider and spins up VMs on demand.
Geographic Redundancy: Cloud providers have multiple regions. Replicating data across regions protects against regional disasters.
Snapshots: Point-in-time copies of VMs. Useful for quick rollback but not a replacement for backups.
Interaction with Related Technologies
BC/DR interacts with:
DNS: Can be used to redirect traffic to an alternate site during failover.
Software-Defined Networking (SDN): Enables automated network reconfiguration during failover.
Replication: Synchronous (data written to both sites simultaneously) or asynchronous (data written to primary, then copied to secondary). Synchronous has lower RPO but higher latency.
RAID: Protects against disk failure but does not replace backups — it is not a DR solution.
Configuration and Verification Commands
While the N10-009 exam does not require deep command knowledge for BC/DR, understanding common tools is helpful:
Backup software: Commands like tar, rsync, or proprietary tools.
Replication: rsync -avz /data/ user@remote:/backup/
Monitoring: ping, traceroute to verify connectivity to alternate site.
DNS changes: nsupdate or web console to update A records.
Exam-Relevant Standards and Frameworks
NIST SP 800-34: Contingency Planning Guide for Federal Information Systems.
ISO 22301: Business Continuity Management.
FEMA: Federal Emergency Management Agency guidelines.
Common Pitfalls
Confusing RTO with RPO. RTO is about downtime; RPO is about data loss.
Thinking that backups alone constitute DR. DR includes procedures, people, and alternate sites.
Assuming cloud automatically provides DR. You must configure replication and failover.
Neglecting testing. An untested plan is a fantasy.
Summary of Key Numbers
RTO: Defined in hours/minutes; typical values: 0 (active-active), 4 hours, 24 hours, 72 hours.
RPO: Defined in time; typical values: 0 (synchronous replication), 15 minutes, 1 hour, 24 hours.
3-2-1 Rule: 3 copies, 2 media, 1 off-site.
Backup retention: Often 30-90 days for daily, 1-7 years for annual.
Testing frequency: At least annually.
Identify Critical Systems and Data
The first step is to perform a Business Impact Analysis (BIA) to identify which systems and data are critical to operations. For each system, determine the maximum allowable downtime (RTO) and data loss (RPO). This step involves interviewing business stakeholders and analyzing dependencies. For example, an e-commerce platform may have an RTO of 1 hour and an RPO of 5 minutes, while a file server may have an RTO of 24 hours and an RPO of 1 day. Document all findings in a formal BIA report.
Design the Recovery Strategy
Based on RTO and RPO, choose appropriate recovery strategies. For critical systems with low RTO, select a hot site or active-active cluster. For less critical systems, a warm or cold site may suffice. Determine backup frequency (e.g., hourly incremental, daily full) and replication method (synchronous for RPO near zero, asynchronous otherwise). Also decide on hardware redundancy (RAID, dual power supplies) and network redundancy (multiple ISPs, redundant switches). The strategy must balance cost against recovery requirements.
Implement the Backup and Replication Solution
Deploy backup software and configure backup jobs according to the RPO. For example, set up hourly incremental backups with a daily full backup. Configure replication to the alternate site—synchronous replication for critical databases, asynchronous for others. Ensure backup media (tape, disk, cloud) is properly labeled and stored off-site. Test the backup process to verify data integrity. Common tools include Veeam, Acronis, or native OS tools like rsync. Document all configurations in the DR plan.
Develop the Disaster Recovery Plan Document
Write a detailed DR plan that includes: contact information for the DR team, step-by-step procedures for activating the alternate site, instructions for restoring data from backups, network configuration changes (e.g., DNS updates, firewall rules), and communication templates for stakeholders. Include a checklist for each phase: activation, failover, restoration, and failback. The plan must be stored in multiple locations (e.g., paper copy in safe, electronic copy in cloud) and reviewed annually.
Test the Plan Regularly
Conduct regular tests to validate the plan. Start with a tabletop exercise to walk through the steps. Then progress to a simulation or parallel test. Finally, perform a full interruption test if risk is acceptable. Document test results and identify gaps. For example, a test might reveal that the alternate site's network bandwidth is insufficient. Update the plan accordingly. The exam emphasizes that testing is mandatory—an untested plan is unreliable.
Scenario 1: E-commerce Platform with Hot Site
A large online retailer requires near-zero downtime. They deploy an active-active architecture across two data centers in different regions. Each data center runs the full application stack behind a global load balancer (e.g., F5 GTM). Database replication is synchronous using Oracle Data Guard, ensuring zero data loss (RPO=0). If one data center fails, the load balancer directs all traffic to the surviving site within seconds (RTO<1 minute). The DR plan is tested quarterly with simulated failures. Misconfiguration could occur if the replication link has insufficient bandwidth, causing database lag—synchronous replication requires low-latency, high-bandwidth links.
Scenario 2: Hospital with Warm Site
A hospital's Electronic Health Record (EHR) system has an RTO of 4 hours and an RPO of 1 hour. They maintain a warm site in a separate building with pre-installed servers and storage, but no live data. Every hour, the primary site sends incremental backups to the warm site via a dedicated WAN link. In a disaster, the IT team restores the latest backup to the warm site servers, a process that takes about 2 hours. The network team updates DNS to point to the warm site's IP. A common mistake is failing to test the restoration process—if the backup is corrupt, recovery fails. The hospital conducts a full failover test twice a year.
Scenario 3: Bank with Cold Site and Tape Backups
A regional bank has a low budget for DR. They use a cold site in another state—a rented space with power and cooling. Daily full backups are written to tape, and tapes are shipped off-site weekly. RTO is 7 days, RPO is 24 hours. In a disaster, the bank must order new servers, install them, and restore from tape. This process is slow and error-prone; tape media can degrade. The bank tests the plan annually by restoring a subset of data from tape to verify readability. A common issue is that the cold site lacks sufficient network capacity to download software updates, delaying recovery.
N10-009 Objective 3.4: Given a scenario, implement business continuity and disaster recovery solutions. The exam tests your ability to apply BC/DR concepts to real-world scenarios. Expect questions that present a business requirement and ask you to select the appropriate RTO, RPO, site type, backup method, or testing procedure.
Common Wrong Answers and Why Candidates Choose Them:
Confusing RTO and RPO: A question might say 'The company can tolerate 2 hours of downtime and 15 minutes of data loss.' Candidates often swap the values—assigning 2 hours as RPO and 15 minutes as RTO. Remember: RTO is about downtime (how long to restore), RPO is about data loss (how far back in time you recover).
Choosing cold site when RTO is low: If a scenario requires recovery within 4 hours, a cold site (RTO days) is wrong. Candidates pick cold site because it's cheap, ignoring the RTO requirement.
Selecting incremental backups for fast restore: Incremental backups have slower restore because you need to apply all incrementals in order. The fastest restore is from a full backup. Candidates think incremental is faster because it's smaller, but restore time is longer.
Assuming cloud backup is automatically disaster recovery: Cloud backup is just data storage. DR requires compute resources to run applications. DRaaS provides both, but simple cloud backup does not.
Specific Numbers and Terms That Appear on the Exam:
RTO and RPO definitions and typical values (e.g., 0, 4 hours, 24 hours).
Hot, warm, cold site definitions.
Full, incremental, differential backup characteristics.
3-2-1 backup rule.
Tabletop, simulation, parallel, full interruption test.
Active-active vs active-passive.
Synchronous vs asynchronous replication.
Edge Cases and Exceptions:
RTO of zero: Only possible with active-active or synchronous replication with automatic failover.
RPO of zero: Only possible with synchronous replication; impossible with backup-only solutions.
Tape backups: Still used for long-term archival, not for rapid recovery.
Virtualization: Can reduce RTO because VMs can be restored on any hypervisor.
How to Eliminate Wrong Answers:
If a question asks for the best site type given an RTO, eliminate any site with a longer RTO than required. Hot site for minutes, warm for hours, cold for days.
If a question asks about backup type for fastest backup, choose incremental. For fastest restore, choose full.
If a question asks about testing type that is least disruptive, choose tabletop.
Always match the metric (RTO vs RPO) to the business statement.
RTO = maximum acceptable downtime; RPO = maximum acceptable data loss (in time).
Hot site = immediate takeover; warm site = some equipment ready; cold site = nothing ready.
Full backup copies all data; incremental copies changes since last backup; differential copies changes since last full.
3-2-1 backup rule: 3 copies, 2 media types, 1 off-site.
Testing types: tabletop (walkthrough), simulation (partial), parallel (side-by-side), full interruption (production shutdown).
Active-active provides near-zero RTO; active-passive provides minutes to hours RTO.
Synchronous replication achieves RPO=0 but requires low latency; asynchronous replication has higher RPO but lower performance impact.
DR plan must be documented, stored off-site, and tested at least annually.
Cloud backup is not DR; DRaaS provides compute and storage for failover.
RAID protects against disk failure but is not a substitute for backups.
BIA identifies critical systems and determines RTO/RPO.
Failback is the process of returning to the primary site after a disaster.
These come up on the exam all the time. Here's how to tell them apart.
Hot Site
Fully equipped with hardware, software, and data
RTO: minutes to hours
Most expensive
Requires continuous replication or frequent backups
Best for mission-critical systems
Cold Site
Basic facility with no IT equipment
RTO: days to weeks
Least expensive
Equipment must be procured after disaster
Suitable for non-critical systems
Incremental Backup
Backs up only data changed since last backup (full or incremental)
Fastest backup time
Smallest storage footprint
Restore requires full + all incrementals in order
Higher risk of restore failure due to chain dependency
Differential Backup
Backs up all data changed since last full backup
Backup time increases over the week
Larger storage footprint than incremental
Restore requires full + latest differential only
Simpler and faster restore than incremental
Synchronous Replication
Data written to primary and secondary simultaneously
RPO = 0 (no data loss)
Requires low-latency, high-bandwidth link
Increases write latency due to acknowledgment from secondary
Used for critical databases
Asynchronous Replication
Data written to primary, then copied to secondary after a delay
RPO > 0 (some data loss possible)
Tolerates higher latency and lower bandwidth
No impact on write performance
Used for less critical data
Mistake
Backups are the same as disaster recovery.
Correct
Backups are only one component of DR. DR also includes procedures, alternate sites, networking, and people to restore operations. Without a plan to restore from backups, they are useless.
Mistake
A hot site guarantees zero downtime.
Correct
Even a hot site has some failover time (seconds to minutes) depending on the technology. True zero downtime requires active-active with real-time synchronization and automatic failover.
Mistake
Incremental backups are always faster to restore than full backups.
Correct
Restoring from incremental backups requires applying the full backup plus all subsequent incrementals in order, which can be slower than restoring a single full backup. Incremental saves backup time, not restore time.
Mistake
Cloud storage is automatically disaster recovery.
Correct
Cloud storage provides off-site data protection, but DR requires compute and network resources to run applications. You must configure replication and failover to use cloud for DR.
Mistake
A cold site is sufficient for critical systems with low RTO.
Correct
Cold sites have RTOs measured in days or weeks because equipment must be procured and installed. Critical systems with low RTO (e.g., 1 hour) require hot or warm sites.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
RTO (Recovery Time Objective) is the maximum time a system can be down after a disaster—how quickly you need to restore. RPO (Recovery Point Objective) is the maximum age of data you can afford to lose—how much data you can lose measured in time. For example, an RTO of 4 hours means the system must be back within 4 hours; an RPO of 1 hour means you can lose at most 1 hour of data. The exam often gives a business requirement and asks you to identify the correct RTO or RPO.
A full backup is fastest to restore because you only need to restore a single backup set. Incremental backups require restoring the full backup and then each incremental in order, which takes longer. Differential backups require restoring the full backup and the latest differential. So for restore speed: full > differential > incremental. However, for backup speed, incremental is fastest.
The 3-2-1 rule is a best practice for data protection: maintain at least three copies of your data (one primary and two backups), store them on two different types of media (e.g., disk and tape), and keep one copy off-site (e.g., cloud or remote location). This ensures that if one copy fails or is destroyed, you have other copies available. The exam may test this rule as a multiple-choice option.
A hot site is fully equipped with all hardware, software, and data, ready to take over immediately (RTO minutes to hours). A warm site has some equipment but not all, and data may be outdated; it requires some setup and data restoration (RTO hours to days). The key difference is readiness: hot sites are ready to go, warm sites need activation. Cold sites have no equipment at all.
A tabletop exercise is a discussion-based test of the BC/DR plan where key stakeholders walk through the steps without actually performing failover or restoration. It is low-cost and low-risk, used to identify gaps in the plan and ensure everyone knows their roles. It is the least disruptive testing method. The exam contrasts it with simulation, parallel, and full interruption tests.
No, cloud backups alone do not constitute a DR site. Backups store data, but DR requires compute and network resources to run applications. For full DR, you need a DRaaS (Disaster Recovery as a Service) solution that replicates both data and compute, or you must provision cloud resources manually during a disaster. Cloud backups are only one component of a DR strategy.
A BIA identifies critical business functions and the impact of their disruption. It determines the maximum tolerable downtime (RTO) and data loss (RPO) for each system. The BIA also helps prioritize recovery efforts and justifies the cost of DR solutions. It is the first step in developing a BC/DR plan.
You've just covered Business Continuity and Disaster Recovery — now see how well it sticks with free N10-009 practice questions. Full explanations included, no account needed.
Done with this chapter?