This chapter covers designing business continuity solutions in Azure, a critical domain for the AZ-305 exam. Business continuity questions account for approximately 15-20% of the exam, focusing on your ability to architect solutions that meet Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements. You will learn to choose between Azure Backup and Azure Site Recovery, design for high availability across regions, and implement disaster recovery strategies. Mastering this chapter ensures you can design resilient solutions that minimize downtime and data loss.
Jump to a section
Think of an office building's primary power supply as a standard Azure service running in a single region. The building has its own generator (Azure Backup) for short outages, but if the whole grid fails (regional disaster), the generator alone won't help. The building manager also has a contract with a mobile generator company that can bring a large generator on a truck (Azure Site Recovery) to provide full power for days. The manager regularly tests the transfer switch (disaster recovery drill) to ensure it works. The mobile generator is stored at a different depot (paired region) to avoid the same disaster. If the building loses power and the generator fails, the manager calls the mobile generator company, which dispatches the truck. The time from call to full power is the Recovery Time Objective (RTO), and the amount of data lost during the switch is the Recovery Point Objective (RPO). The manager must decide whether to pay for a dedicated mobile generator always on standby (active-active) or a shared one that takes longer to arrive (active-passive). This mirrors Azure's business continuity options: Azure Backup (like the building's generator) protects against accidental deletion or corruption, while Azure Site Recovery (the mobile generator) protects against full region failure. The building's disaster recovery plan specifies RTO and RPO, just as Azure does.
What is Business Continuity and Why It Exists
Business continuity (BC) refers to the capability of an organization to continue delivering services during and after a disruptive event. In Azure, BC encompasses two main pillars: high availability (HA) and disaster recovery (DR). HA ensures that applications remain operational despite failures within a region (e.g., VM failures, datacenter outages), while DR protects against region-wide failures (e.g., natural disasters, large-scale outages). The AZ-305 exam tests your ability to select and configure Azure services to meet specific RTO and RPO targets. RTO is the maximum acceptable time to restore service after a failure; RPO is the maximum acceptable amount of data loss measured in time.
How Azure Backup Works Internally
Azure Backup is a PaaS service that provides backup for Azure VMs, SQL Server, SAP HANA, Azure Files, and on-premises workloads via the Microsoft Azure Recovery Services (MARS) agent. The core mechanism involves creating recovery points (snapshots) stored in a Recovery Services vault. For Azure VMs, Azure Backup uses the VSS (Volume Shadow Copy Service) to ensure application-consistent snapshots. The backup process: 1. The backup extension (installed on the VM) coordinates with VSS to quiesce I/O and create a snapshot. 2. The snapshot is transferred to the vault in the same region (or geo-redundant storage if enabled). 3. The backup engine applies change tracking to identify modified blocks, enabling incremental backups after the initial full backup.
Key defaults and timers:
Default backup frequency: daily (once per day).
Retention ranges: daily (up to 120 days), weekly (up to 260 weeks), monthly (up to 120 months), yearly (up to 99 years).
Instant Restore: allows restoring from snapshots stored locally (up to 7 days) before transfer to vault.
Backup policy: defines schedule and retention. For Azure VMs, the default policy backs up at 12:00 AM UTC and retains daily backups for 30 days.
How Azure Site Recovery Works Internally
Azure Site Recovery (ASR) orchestrates replication of Azure VMs, on-premises Hyper-V/VMware VMs, and physical servers to a secondary Azure region (or on-premises). For Azure-to-Azure DR, ASR uses continuous replication at the block level. The mechanism: 1. The Site Recovery Mobility service (for VMware/physical) or the Azure Site Recovery replication provider (for Hyper-V) captures disk writes on the source. 2. Writes are replicated asynchronously to a cache storage account in the source region. 3. From the cache, data is sent to the target region's managed disks (replica disks). 4. Recovery points are created every few minutes (default: 5 minutes for Azure VMs). 5. During failover, the replica disks are attached to a new VM in the target region, and the VM starts.
Key timers and thresholds:
RPO: typically 5-15 minutes for Azure VMs (asynchronous replication).
RTO: depends on VM size and network; Microsoft claims 30 minutes for most scenarios.
Failover types: Test failover (isolated network), Planned failover (zero data loss), Unplanned failover (crash-consistent).
Recovery plan: groups VMs and scripts into an orchestrated sequence.
Key Components and Their Defaults
Recovery Services vault: Central storage for backups and replication data. Supports LRS, GRS, and RA-GRS storage redundancy. Default is GRS.
Backup policy: Defines schedule (daily/weekly) and retention. Default daily backup at midnight with 30-day retention.
Replication policy: For ASR, defines RPO threshold (default 15 minutes), recovery point retention (default 24 hours), and app-consistent snapshot frequency (default 1 hour).
Cache storage account: Standard LRS, used to stage replicated data before sending to target.
Target region: Must be a paired region (e.g., East US -> West US).
Configuration and Verification Commands
Azure Backup:
Enable backup: az backup protection enable-for-vm --resource-group myRG --vault-name myVault --vm myVM --policy-name DefaultPolicy
List recovery points: az backup recoverypoint list --resource-group myRG --vault-name myVault --container-name myVM --item-name myVM
Restore VM: az backup restore restore-disks --resource-group myRG --vault-name myVault --container-name myVM --item-name myVM --rp-name <rpName> --storage-account myStorage
Azure Site Recovery:
Enable replication: az site-recovery replication-protected-item create --fabric-name myFabric --protection-container myContainer --name myVM --policy-name myPolicy --source-site-id /subscriptions/...
Start failover: az site-recovery replication-protected-item planned-failover --fabric-name myFabric --protection-container myContainer --name myVM --failover-direction PrimaryToRecovery
Verify replication health: az site-recovery replication-protected-item show --name myVM --fabric-name myFabric --protection-container myContainer
Interaction with Related Technologies
Azure Traffic Manager: Used for DNS-based traffic routing across regions. Can be integrated with ASR to automatically redirect traffic after failover.
Azure Front Door: Global load balancer with HTTP/HTTPS routing. Supports health probes and automatic failover.
Azure Load Balancer: For regional HA within a single region (e.g., availability zones).
Azure Storage: Backups are stored in Recovery Services vaults, which use Azure Storage blobs. For geo-redundancy, use GRS or RA-GRS.
Azure Active Directory: Required for authentication and authorization of backup and replication operations.
Designing for RTO and RPO
To meet strict RTO (e.g., <1 hour) and RPO (e.g., <15 minutes), you must choose the right combination:
For low RPO (minutes), use ASR with continuous replication.
For low RTO (minutes), use ASR with pre-created target resources (e.g., pre-provisioned network, disks) and recovery plans.
For cost-sensitive scenarios, Azure Backup offers longer RTO (hours) but lower cost.
Exam tip: If the requirement is RPO of 15 minutes and RTO of 1 hour, ASR is the correct choice. If RPO is 24 hours and RTO is 24 hours, Azure Backup suffices.
Region Pairs and Availability Zones
Azure regions are paired for DR (e.g., East US with West US). ASR automatically uses the paired region as the default target. Availability zones provide HA within a region (99.99% SLA for VMs). For critical workloads, deploy across zones and use ASR for cross-region DR.
Define RTO and RPO Requirements
Start by determining the maximum acceptable downtime (RTO) and data loss (RPO) for each workload. For critical workloads, RTO might be 1 hour and RPO 15 minutes. For non-critical, RTO could be 24 hours with RPO of 24 hours. This drives the choice between Azure Backup (higher RTO/RPO) and Azure Site Recovery (lower RTO/RPO). Document these requirements in a business continuity plan.
Assess Workload Dependencies
Map out dependencies such as databases, networking, and authentication. For example, a web app might depend on Azure SQL Database and Azure AD. Ensure that the DR solution includes all dependencies. For multi-tier applications, use recovery plans in ASR to orchestrate the failover order (e.g., database first, then app servers).
Choose Azure Backup or Azure Site Recovery
For workloads that can tolerate hours of downtime (RTO > 4 hours) and data loss up to 24 hours, use Azure Backup. For workloads requiring minutes of RTO and RPO (e.g., < 1 hour RTO, < 15 min RPO), use ASR. On-premises workloads require Azure Backup for file/folder backup and ASR for full VM replication. The exam often tests this decision point.
Configure Backup or Replication
Create a Recovery Services vault. For Azure Backup, enable backup on VMs using a policy that meets retention requirements. For ASR, enable replication by specifying source and target regions, replication policy, and target resources (network, storage). Ensure that the target region has sufficient capacity (vCPU quotas).
Test Failover and Validate
Perform test failovers regularly (e.g., quarterly) to validate that RTO and RPO are met. For ASR, use test failover with an isolated network to avoid impacting production. Verify that applications start correctly and that data is consistent. Document the results and adjust configurations if needed.
Enterprise Scenario 1: Global E-Commerce Platform
A major e-commerce company runs its online store on Azure VMs with Azure SQL Database. They require RTO of 30 minutes and RPO of 5 minutes to avoid significant revenue loss during peak shopping seasons. They deploy ASR for Azure-to-Azure replication between East US and West US (paired regions). They configure a replication policy with 5-minute RPO threshold and 24-hour recovery point retention. They use Azure Traffic Manager to route user traffic to the primary region. During a regional outage, they initiate an unplanned failover, and Traffic Manager automatically redirects traffic to the West US region. The failover completes in 20 minutes, meeting RTO. A common misconfiguration is not pre-creating the target network and subnet, which delays failover. They also perform quarterly test failovers to ensure the process works.
Enterprise Scenario 2: Healthcare Provider with On-Premises Systems
A hospital uses on-premises Hyper-V VMs for patient records. They need to meet compliance requirements for data retention (7 years) and DR with RTO of 4 hours and RPO of 1 hour. They use Azure Backup to back up VMs to a Recovery Services vault with GRS for geo-redundancy. They also use Azure Site Recovery to replicate critical VMs to Azure for DR. They set the backup policy to daily backups with 7-year retention. For ASR, they use a replication policy with 1-hour RPO. During a disaster, they fail over to Azure and run the VMs temporarily. A challenge is managing network connectivity between on-premises and Azure via VPN or ExpressRoute. They ensure that the DR site has sufficient IP address space to avoid conflicts.
Enterprise Scenario 3: Financial Services with Multi-Region Active-Active
A bank runs a trading application that requires zero data loss (RPO=0) and automatic failover. They deploy the application across two Azure regions in an active-active configuration using Azure Front Door for global load balancing. They use Azure SQL Database with active geo-replication to replicate data synchronously between regions. For the application VMs, they use availability zones within each region for HA and ASR for cross-region DR with planned failover (which provides zero data loss). They set the ASR replication policy to 5-minute RPO, but because they use synchronous database replication, the actual RPO is zero for the database. The key lesson is that VM-level DR cannot achieve RPO=0 for the VM itself; only application-level replication (e.g., database) can. They conduct monthly failover drills to ensure consistency.
What AZ-305 Tests on This Topic
The AZ-305 exam objective 3.2 focuses on designing business continuity solutions, specifically: recommend a solution for backup and recovery (Azure Backup vs. ASR), design for high availability (availability zones, load balancers), and design for disaster recovery (region pairs, failover). Questions often present a scenario with RTO and RPO requirements, and you must choose the correct service and configuration. Common objective codes: 3.2.1 (recommend a recovery solution), 3.2.2 (design for high availability), 3.2.3 (design for disaster recovery).
Most Common Wrong Answers and Why Candidates Choose Them
Choosing Azure Backup when ASR is needed: Candidates see 'backup' and assume it covers DR. But Azure Backup has RPO of 24 hours (daily backup) and RTO of hours, which fails for low RTO/RPO requirements. The exam presents scenarios with RPO of 15 minutes, making ASR the only correct choice.
Selecting Azure Site Recovery for on-premises VMs without the Mobility service: Candidates forget that for VMware/physical servers, the Mobility service must be installed. The exam might ask: 'You need to replicate an on-premises Linux VM to Azure. What component is required?' The answer is the Mobility service.
Assuming Geo-Redundant Storage (GRS) alone provides DR: Some candidates think storing backups in GRS is sufficient for DR. However, GRS provides resilience against regional failures for storage, but it does not automate failover of compute or applications. You still need ASR or Azure Backup with GRS to restore in another region.
Specific Numbers and Terms That Appear Verbatim
Default RPO for Azure VM replication in ASR: 5 minutes (but the exam might say 'up to 15 minutes' as the threshold).
Default retention for recovery points: 24 hours.
Recovery point types: crash-consistent, app-consistent (VSS).
Backup frequency: daily (default) or weekly.
Retention: up to 99 years.
Region pairs: e.g., East US <-> West US, UK South <-> UK West.
Availability zones: 99.99% SLA for VMs.
Edge Cases and Exceptions
Azure VM with premium SSD: Backup and replication work, but you must consider IOPS limits.
Encrypted VMs: Azure Backup supports Azure Disk Encryption (ADE) and encryption at host. ASR supports replication of encrypted disks.
Large VMs (e.g., 32 vCPUs): ASR replication may take longer; RTO may exceed 30 minutes.
SQL Server Always On: Use ASR with app-consistent snapshots for database consistency.
How to Eliminate Wrong Answers
If the scenario mentions RPO of 15 minutes or less, eliminate Azure Backup. If it mentions RTO of 1 hour or less, eliminate Azure Backup. If it mentions on-premises, look for the Mobility service or MARS agent. If it mentions cost optimization, consider Azure Backup (cheaper) vs. ASR (more expensive). Always check if the solution must be within a single region (use availability zones) or across regions (use ASR).
Azure Backup provides daily backups with RPO of 24 hours; use for low-critical workloads.
Azure Site Recovery provides continuous replication with RPO of 5 minutes; use for critical workloads.
Recovery Services vault stores backups and replication data; choose GRS for cross-region DR.
Availability zones protect against datacenter failures within a region, not region-wide disasters.
Region pairs (e.g., East US/West US) are used for Azure Site Recovery target regions.
Test failovers should be performed regularly to validate RTO/RPO.
For on-premises VMs, install the Mobility service for VMware/physical or use Hyper-V replica for Hyper-V.
These come up on the exam all the time. Here's how to tell them apart.
Azure Backup
RPO typically 24 hours (daily backup).
RTO hours to days.
Lower cost (pay per GB stored).
Supports on-premises via MARS agent.
Retention up to 99 years.
Azure Site Recovery
RPO 5-15 minutes (asynchronous replication).
RTO 30 minutes to 2 hours.
Higher cost (pay per replicated VM and storage).
Supports on-premises via Mobility service.
Retention limited to 24 hours (configurable up to 72 hours).
Mistake
Azure Backup provides disaster recovery with RPO of minutes.
Correct
Azure Backup typically has an RPO of 24 hours (daily backup) or up to 12 hours if using weekly backups. It cannot achieve minute-level RPO. For that, you need Azure Site Recovery.
Mistake
Azure Site Recovery replicates data synchronously, so RPO is zero.
Correct
ASR uses asynchronous replication for Azure VMs, with default RPO of 5 minutes. It does not guarantee zero data loss. For zero RPO, you need application-level replication (e.g., SQL Always On).
Mistake
You can use Azure Backup to restore a VM in a different region by default.
Correct
By default, Azure Backup stores data in the same region as the vault. To restore in a different region, you must use Geo-Redundant Storage (GRS) and cross-region restore (feature for Azure VMs).
Mistake
Availability zones provide disaster recovery across regions.
Correct
Availability zones are within a single region (e.g., three datacenters in East US). They protect against datacenter failures, not region-wide disasters. For cross-region DR, use Azure Site Recovery.
Mistake
A Recovery Services vault can only store backups from one region.
Correct
A Recovery Services vault is regional, but you can create multiple vaults in different regions. However, a single vault can protect resources from multiple regions if they are in the same region as the vault? Actually, vaults are tied to a region; you cannot protect a VM in West US with a vault in East US. Each vault protects resources in its own region.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure Backup is a backup service that creates periodic snapshots (daily) with RPO of 24 hours, ideal for data protection and long-term retention. Azure Site Recovery replicates VMs continuously with RPO of minutes, designed for disaster recovery with fast failover. Use Backup for backup, ASR for DR.
Yes, Azure Site Recovery supports replication between availability zones within the same region. This is useful for zonal DR scenarios, though typically you use availability sets or zones for HA and ASR for cross-region.
The default RPO threshold is 15 minutes, but actual RPO is typically around 5 minutes. You can configure the threshold in the replication policy.
You must enable cross-region restore on the Recovery Services vault (requires GRS storage). Then, when restoring, you can select the target region. This feature is in preview for some regions.
Crash-consistent points capture data as if the VM crashed (no application state). App-consistent points use VSS to flush memory and complete pending I/O, ensuring application data integrity (e.g., database transactions). ASR creates app-consistent points every hour by default.
Yes, ASR can replicate on-premises Hyper-V VMs, VMware VMs, and physical servers to Azure. For VMware/physical, you need the Mobility service installed on each VM. For Hyper-V, you use the Azure Site Recovery Provider.
You can retain daily backups for up to 120 days, weekly for 260 weeks, monthly for 120 months, and yearly for 99 years. The total number of recovery points per protected instance is 9999.
You've just covered Designing Business Continuity Solutions — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?