This chapter covers Azure Site Recovery (ASR) design for the AZ-305 exam, focusing on how to architect disaster recovery solutions that meet business continuity requirements. ASR is a critical component of the Business Continuity domain (Objective 3.2) and appears in approximately 15-20% of exam questions, often in scenario-based questions requiring you to select the correct replication method, RPO/RTO settings, or failover configuration. You will learn the internal mechanisms, key components, configuration steps, and common pitfalls to avoid on the exam.
Jump to a section
Imagine a large office building with 200 employees working on their computers. The building has a backup site across town: a mirror office with identical desks, computers, and network connections, but it's empty most of the time. The IT team runs a drill every month: they trigger a fire alarm, and everyone must evacuate to the backup site. At the backup site, each employee logs into their pre-configured computer, which has all their files and applications already synced from the primary site. The network is pre-configured so that customers can still reach the company's services via a redirected IP address. The drill takes 15 minutes to complete, and the team measures the time from alarm to full productivity. They also test restoring from older backups to ensure data integrity. In real disaster, the same process runs automatically: the alarm triggers, employees go to the backup site, and business continues with minimal interruption. Azure Site Recovery (ASR) works the same way: it replicates VMs to a secondary region, orchestrates failover with a recovery plan, and allows testing without impacting production. The recovery time objective (RTO) and recovery point objective (RPO) are like the drill's target time and the maximum data loss allowed (e.g., 15 minutes and 1 hour).
What is Azure Site Recovery and Why It Exists
Azure Site Recovery (ASR) is a disaster recovery (DR) service that orchestrates replication, failover, and failback of Azure VMs and on-premises machines. Its primary purpose is to enable business continuity by maintaining replicas of workloads in a secondary Azure region or on-premises site. ASR ensures that if a primary site goes down due to disaster, you can quickly bring up workloads in the secondary site with minimal data loss and downtime.
How ASR Works Internally
ASR replication is based on a combination of the following components:
Replication Appliance: For on-premises Hyper-V or VMware VMs, ASR deploys a configuration server (for VMware) or uses Hyper-V replica (for Hyper-V). For Azure VMs, replication is managed by the Azure Site Recovery service without a separate appliance.
Mobility Service: For VMware VMs and physical servers, the Mobility Service must be installed on each VM. It intercepts disk writes and sends them to the process server.
Process Server: Receives replication data from the Mobility Service, caches it, compresses it, and sends it to the target storage (Azure or on-premises).
Target Storage: Replication data is stored in Azure Storage (for Azure-to-Azure DR) or in the secondary on-premises site. For Azure VMs, ASR uses managed disks (premium or standard) as the target.
Replication Policy: Defines RPO threshold (default 15 minutes), recovery point retention (default 24 hours), and app-consistent snapshot frequency (default 60 minutes).
Recovery Plan: A collection of VMs grouped together that can fail over as a unit. You can add scripts, Azure Automation runbooks, and manual actions to control the order of VM startup.
Replication Mechanism:
For Azure VMs, ASR uses continuous replication at the disk level. It tracks changes using change block tracking (CBT) and sends only changed blocks to the target region.
For on-premises Hyper-V VMs, ASR uses Hyper-V Replica, which sends initial full copy and then incremental changes every 30 seconds (configurable).
For VMware VMs, the Mobility Service sends data to the process server, which then sends to Azure Storage in the target region.
Key Components, Values, Defaults, and Timers
RPO (Recovery Point Objective): Maximum acceptable data loss. Default: 15 minutes for Azure VMs. You can set it to 30 seconds for Hyper-V, but for Azure VMs the minimum is 15 minutes.
RTO (Recovery Time Objective): Time to recover after failover. ASR provides an estimated RTO based on VM size and disk activity, but actual RTO depends on network speed and VM startup time.
Recovery Point Retention: How long recovery points are kept. Default: 24 hours. Maximum: 72 hours.
App-Consistent Snapshot Frequency: How often to take app-consistent snapshots using VSS. Default: 60 minutes. You can set it to 0 to disable.
Replication Policy: Can be customized per VM or group of VMs. Key settings: RPO threshold, recovery point retention, app-consistent snapshot frequency.
Failover Types:
Test Failover: Isolated failover for DR drills. Does not affect production.
Planned Failover: Zero data loss failover for planned maintenance.
Unplanned Failover: For actual disasters. May have data loss.
Failback: After failover, you can failback to the primary site once it is restored. Requires re-protection and reverse replication.
Configuration and Verification Commands
For Azure VMs, you can enable replication via Azure Portal, PowerShell, or CLI. Example PowerShell to enable replication:
$vm = Get-AzVM -ResourceGroupName "PrimaryRG" -Name "MyVM"
$vault = Get-AzRecoveryServicesVault -ResourceGroupName "RecoveryRG" -Name "MyVault"
Set-AzVMBcdr -Vault $vault -VM $vm -ReplicationPolicy "DefaultPolicy" -TargetResourceGroupName "TargetRG" -TargetRegion "West US"To verify replication health:
Get-AzRecoveryServicesAsrReplicationProtectedItem -ProtectionContainer $container | Select-Object FriendlyName, ProtectionHealthFor on-premises, you use the Azure Site Recovery Unified Setup to deploy the configuration server and process server.
How ASR Interacts with Related Technologies
ASR integrates with: - Azure Backup: For long-term retention of backups. ASR is for DR (short RPO/RTO), while Azure Backup is for archival. - Azure Traffic Manager: To route traffic to the secondary region after failover. - Azure DNS: To update DNS records to point to the secondary region. - Azure Automation: Runbooks to automate post-failover tasks like updating connection strings. - Azure Monitor: Alerts for replication health and failover status.
Replication Scenarios
Azure to Azure: Replicate VMs from one region to another. Recommended for most cloud-native workloads. Uses managed disk snapshots.
On-premises to Azure: Replicate Hyper-V, VMware, or physical servers to Azure. Requires on-premises infrastructure (configuration server, process server).
On-premises to on-premises: Replicate Hyper-V VMs between two on-premises sites using Hyper-V Replica (not managed by ASR directly).
Network Considerations
IP Address: During failover, you can retain the same IP address using Azure Site Recovery's network mapping and subnet settings. Alternatively, you can use a different IP range and update DNS.
Bandwidth: Replication traffic can consume significant bandwidth. Use compression and throttling settings. For Azure VMs, replication traffic goes over the Azure backbone, not the internet.
ExpressRoute: For on-premises to Azure replication, you can use ExpressRoute for better reliability and lower latency.
Security and Compliance
Data is encrypted in transit and at rest. For Azure VMs, replication uses TLS 1.2. Storage is encrypted using Azure Storage Service Encryption (SSE).
ASR does not replicate Azure Key Vault or other managed services. You need to handle those separately.
Limitations and Edge Cases
Maximum disk size: For Azure VMs, ASR supports managed disks up to 32 TB. Larger disks require third-party solutions.
Number of disks: ASR supports up to 64 disks per VM.
VM size: The target VM size must be in the same family as the source (e.g., you cannot fail over a D-series VM to an E-series unless you manually change).
Availability Zones: ASR can replicate within the same region across zones, but for DR you typically use region-to-region.
Exam Tip
The AZ-305 exam focuses on scenario-based questions where you must choose the correct DR solution based on RPO/RTO requirements. For example, if RPO is 15 minutes and RTO is 1 hour, ASR is appropriate. If RPO is 24 hours and RTO is 12 hours, Azure Backup may be better. Also, know the difference between test failover, planned failover, and unplanned failover, and when to use each.
Enable Replication for Azure VM
In the Azure Portal, navigate to the VM you want to protect. Under Operations, select 'Disaster Recovery'. Choose a target region (e.g., West US for a VM in East US). Select a recovery services vault (create if needed). Configure replication policy: default RPO 15 minutes, recovery point retention 24 hours, app-consistent snapshot frequency 60 minutes. ASR then initiates an initial full copy of the VM's managed disks to the target region. This can take hours depending on disk size and network speed. After initial sync, continuous replication begins using change block tracking (CBT), sending only changed blocks every few minutes.
Configure Network Mapping
Map the source virtual network to a target virtual network in the secondary region. This ensures that after failover, VMs are connected to the correct network with appropriate IP addresses. You can also configure subnet mapping. For IP address retention, you can use the same IP range in the target region (requires overlapping address spaces) or use a different range and update DNS. ASR provides options to assign IP addresses automatically or manually. If you retain IPs, ensure the target subnet has sufficient free addresses.
Create a Recovery Plan
A recovery plan groups VMs that should fail over together. In the recovery services vault, go to Recovery Plans and create a new plan. Add VMs in the desired order. You can add pre-actions (before failover) and post-actions (after failover) using scripts or Azure Automation runbooks. For example, a pre-action could stop the production application, and a post-action could update DNS records. The plan allows you to specify startup order and wait times between groups. This is critical for multi-tier applications where the database must start before the web tier.
Run Test Failover
Before actual disaster, perform a test failover to validate the recovery plan. In the recovery plan, select 'Test Failover'. Choose a test virtual network (isolated from production). ASR creates VMs in the target region using the latest recovery point (or a specific point). The test VMs are isolated so they don't affect production. Validate that applications start correctly and data is consistent. After testing, clean up by stopping the test failover, which deletes the test VMs. This step does not impact the ongoing replication.
Perform Unplanned Failover
When a disaster strikes, initiate an unplanned failover. In the recovery plan, select 'Failover' and choose 'Unplanned'. You can choose a recovery point: latest (lowest RPO), latest crash-consistent, latest app-consistent, or a custom point. ASR then shuts down the source VMs (if accessible) and starts the target VMs. The RTO starts from the moment you trigger failover until the VMs are running and accessible. After failover, you need to update DNS records to point to the new IPs. You can also commit the failover to prevent further changes.
Enterprise Scenario 1: Multi-Tier E-Commerce Application
A retail company runs a three-tier e-commerce application on Azure VMs in the East US region. The application consists of web servers, application servers, and a SQL Server Always On availability group. The business requires an RPO of 15 minutes and an RTO of 1 hour. They use ASR for Azure-to-Azure replication to the West US region. They create a recovery plan that first fails over the SQL Server using a script to bring up the secondary replica, then the application servers, and finally the web servers. They use Azure Traffic Manager to automatically route traffic to West US after failover. During a real disaster, the failover completed in 45 minutes, meeting the RTO. The key challenge was ensuring the SQL Server failover script handled the availability group correctly, which required testing multiple times.
Enterprise Scenario 2: On-Premises to Azure for a Financial Services Firm
A financial services firm has on-premises VMware VMs running critical trading applications. They need to replicate to Azure for DR with an RPO of 5 minutes and RTO of 2 hours. They deploy ASR on-premises with a configuration server and process server. They use ExpressRoute for reliable replication. The Mobility Service is installed on each VM. They configure a replication policy with app-consistent snapshots every 15 minutes. During a test failover, they discovered that the application had a hardcoded IP address, causing connectivity issues. They resolved this by using a script to update the IP address during failover. The production failover worked as expected, but the initial sync took 3 days due to large disk sizes (10 TB total). They learned to throttle replication during business hours to avoid bandwidth saturation.
Common Issues When Misconfigured
Incorrect RPO threshold: Setting RPO too low (e.g., 1 minute) for Azure VMs is not supported; the minimum is 15 minutes. The exam may test this.
Network misconfiguration: If the target virtual network is not mapped, VMs may fail to start or have no connectivity.
Recovery plan order: Starting the web tier before the database can cause application failures.
Forgetting to clean up test failover: Test VMs incur costs and may interfere with subsequent failovers.
Not updating DNS after failover: Users cannot reach the application.
What AZ-305 Tests on This Topic (Objective 3.2)
The exam focuses on selecting the appropriate disaster recovery solution based on business requirements. Key areas: - Replication methods: Azure-to-Azure vs. on-premises to Azure. - RPO and RTO: Know the default values and what is achievable. ASR for Azure VMs has a minimum RPO of 15 minutes; for Hyper-V it can be 30 seconds. - Recovery plans: How to design them for multi-tier applications. - Test failover vs. planned vs. unplanned: When to use each. - Integration with other services: Traffic Manager, Azure DNS, Azure Automation. - Failback: After failover, you must re-protect and failback.
Common Wrong Answers and Why Candidates Choose Them
Choosing Azure Backup over ASR for low RPO: Candidates think Azure Backup provides faster recovery, but Azure Backup has an RPO of 24 hours (for daily backups) while ASR can achieve 15 minutes. The exam will present a scenario with RPO of 15 minutes and RTO of 1 hour; the correct answer is ASR.
Selecting 'Planned failover' for disaster: Candidates confuse planned failover (zero data loss) with unplanned failover. Planned failover requires the source to be healthy; for a disaster, you must use unplanned failover.
Assuming ASR replicates Azure SQL Database: ASR only replicates IaaS VMs, not PaaS databases. For Azure SQL Database, you need active geo-replication or failover groups.
Thinking test failover affects production: Candidates worry test failover will impact production, but it runs in an isolated network.
Specific Numbers and Terms That Appear Verbatim
RPO: 15 minutes (default for Azure VMs)
RPO: 30 seconds (minimum for Hyper-V)
Recovery point retention: 24 hours (default), max 72 hours
App-consistent snapshot frequency: 60 minutes (default)
Maximum disks per VM: 64
Maximum disk size: 32 TB
Edge Cases and Exceptions
If the source region goes down completely, you cannot initiate failover from the portal? Actually, you can fail over to the target region even if the source is unavailable, because the replication data is in the target region.
If you have a single VM with a single disk, ASR works fine. But for VMs with premium SSD, replication costs are higher.
ASR does not support replication of unmanaged disks (classic storage). Must convert to managed disks first.
How to Eliminate Wrong Answers
If the question mentions 'lowest possible RPO' for Azure VMs, eliminate any answer with RPO less than 15 minutes.
If the question says 'must preserve IP addresses', look for answers that include network mapping and subnet configuration.
If the question involves 'testing without impact', the answer should include 'test failover'.
Azure Site Recovery (ASR) is the primary DR solution for Azure IaaS VMs, providing replication to a secondary region with an RPO of 15 minutes (default).
For on-premises Hyper-V VMs, ASR can achieve an RPO as low as 30 seconds.
Test failover runs in an isolated network and does not affect production; it is used for DR drills.
Recovery plans allow grouping VMs and specifying startup order, scripts, and manual actions.
After an unplanned failover, you must re-protect VMs and perform a planned failover to failback.
ASR does not replicate PaaS services like Azure SQL Database; use native replication for those.
Network mapping is required to ensure VMs connect to the correct virtual network after failover.
The maximum disk size supported by ASR is 32 TB; maximum disks per VM is 64.
These come up on the exam all the time. Here's how to tell them apart.
Azure Site Recovery (ASR)
Designed for disaster recovery with low RPO (15 min for Azure VMs) and low RTO (minutes to hours).
Replicates data continuously or at high frequency.
Supports application-consistent snapshots for VSS-aware applications.
Requires ongoing replication, which consumes bandwidth and storage.
Best for critical workloads that need quick recovery.
Azure Backup
Designed for backup and long-term retention with higher RPO (typically 24 hours) and RTO (hours to days).
Takes snapshots at scheduled intervals (e.g., daily).
Provides granular restore of files and folders.
Lower cost for archival storage.
Best for compliance and long-term data retention.
Mistake
Azure Site Recovery can replicate Azure SQL Database and other PaaS services.
Correct
ASR only replicates IaaS VMs. For PaaS services like Azure SQL Database, you must use native replication features such as active geo-replication or failover groups. ASR can replicate VMs that host SQL Server, but not the PaaS database itself.
Mistake
Test failover can impact production VMs if not careful.
Correct
Test failover creates VMs in an isolated network that does not connect to production. It does not affect ongoing replication or production workloads. However, you must clean up test VMs to avoid costs and potential confusion.
Mistake
Planned failover and unplanned failover are interchangeable.
Correct
Planned failover is used for zero data loss during planned maintenance; it requires the source to be healthy and shuts down source VMs gracefully. Unplanned failover is for disasters; it may have data loss and does not require source to be healthy.
Mistake
ASR replicates data in real-time with zero RPO.
Correct
For Azure VMs, the minimum RPO is 15 minutes. For Hyper-V, it can be as low as 30 seconds. Zero RPO is not achievable with ASR; you would need synchronous replication, which is not supported for cross-region DR.
Mistake
After failover, you can directly failback without re-protection.
Correct
After failover, the target VMs are not replicating back to the source. You must re-protect the VMs (enable reverse replication) and then perform a planned failover to failback. This process synchronizes changes from the target to the source.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure Site Recovery is for disaster recovery with low RPO/RTO, replicating data continuously or frequently to a secondary region. Azure Backup is for backup and long-term retention with higher RPO (typically 24 hours) and RTO. Use ASR for critical workloads that need quick recovery, and Azure Backup for compliance and archival. They can be used together: ASR for DR, Azure Backup for backups.
Yes, ASR supports replication of on-premises VMware VMs to Azure. You need to deploy a configuration server and process server on-premises, install the Mobility Service on each VM, and set up replication to an Azure recovery services vault. The process server compresses and encrypts data before sending to Azure.
To retain IP addresses, you must configure network mapping and subnet settings to use the same IP range in the target region. The target subnet must have the same address space. Alternatively, you can use a different IP range and update DNS records. ASR provides options to assign IP addresses automatically or manually.
For Azure VMs, the minimum RPO is 15 minutes. This is the default and cannot be lowered. For on-premises Hyper-V VMs, you can configure replication frequency as low as 30 seconds. For VMware VMs, the RPO is typically 5 minutes or more depending on change rate.
No, ASR replicates to a specific target region. You cannot failover to a third region without setting up additional replication. If you need DR to multiple regions, you must replicate to each region separately (e.g., replicate from East US to West US and also to North Europe).
Your replicated data is stored in the target region in Azure Storage. Even if the source region is destroyed, the target region remains unaffected. You can initiate an unplanned failover to the target region using the latest recovery point. ASR failover does not require the source to be available.
You can use Azure Automation runbooks as pre-actions or post-actions in a recovery plan. For example, a runbook can update the connection string in an application configuration file or update DNS records. You can also use scripts within the recovery plan.
You've just covered Azure Site Recovery Design — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?