Azure Site Recovery (ASR) is a disaster recovery as a service (DRaaS) solution that orchestrates replication, failover, and failback of Azure VMs and on-premises physical/virtual machines. This chapter covers the full ASR workflow, including replication policies, failover options, recovery plans, and integration with Azure networking and automation. For the AZ-104 exam, approximately 5-10% of questions touch on disaster recovery, primarily focusing on ASR configuration, failover types, and replication settings. Mastering ASR is essential for ensuring business continuity and passing the monitoring and recovery objectives.
Jump to a section
Think of Azure Site Recovery (ASR) as a fully automated, off-site backup generator for your entire data center. In a building, you have a main power supply (your primary site). You also have a backup generator (the secondary site) that can power the building if the main supply fails. But a simple generator isn't enough: you need an automatic transfer switch that detects the power loss, starts the generator, and switches the load within seconds. ASR is that automatic transfer switch plus the generator, but for your virtual machines. It continuously replicates your VMs' disks to a secondary Azure region (or on-premises site). It monitors the health of the primary site using health checks and recovery plans. When a disaster is detected (or you initiate a drill), ASR orchestrates the failover: it spins up the replicated VMs in the secondary region, attaches the replicated disks, and updates DNS records so traffic flows to the new location. After the disaster is resolved, you can fail back to the primary site by reversing the replication direction. Just like you'd test your generator monthly, ASR lets you run non-disruptive disaster recovery drills (test failovers) without impacting production. The key difference: ASR is intelligent—it can replicate only changed blocks (like a generator that only turns on when needed), support multi-tier application consistency with app-consistent snapshots, and automate the entire recovery plan with scripts and runbooks.
What is Azure Site Recovery and Why Does It Exist?
Azure Site Recovery (ASR) is a managed disaster recovery service that replicates workloads running on Azure VMs, Hyper-V VMs, VMware VMs, and physical servers to a secondary Azure region (Azure-to-Azure) or to an on-premises site (on-premises-to-Azure). Its primary purpose is to enable business continuity by automating the recovery of applications and data during a disaster, such as a regional outage, hardware failure, or cyberattack. ASR ensures that recovery time objectives (RTOs) and recovery point objectives (RPOs) are met by providing continuous replication and orchestrated failover.
ASR is not a backup service—it is a replication and failover service. Unlike Azure Backup, which stores point-in-time copies for long-term retention, ASR maintains a near-synchronous copy of data with a typical RPO of a few seconds to a few minutes. It also supports application-consistent snapshots using Volume Shadow Copy Service (VSS) on Windows or file-system-consistent snapshots on Linux.
How ASR Works Internally: Step-by-Step Mechanism
Replication: When you enable replication for a VM, ASR installs the Site Recovery Mobility Service extension on the VM (for Azure VMs) or on the on-premises machine. The Mobility Service intercepts disk writes and sends them to a cache storage account in the source region. From there, ASR’s replication engine reads the cached data and transfers it to a managed disk replica in the target region. Replication is continuous, using a delta-sync mechanism that only transfers changed blocks.
Recovery Points: ASR creates recovery points based on the replication policy. A recovery point is a snapshot of the VM’s disks at a specific time. You can configure the frequency of crash-consistent points (default every 5 minutes) and optionally enable app-consistent snapshots (default every 60 minutes). Each recovery point is stored as a managed disk snapshot in the target region.
Failover: When you initiate a failover, ASR selects a recovery point (latest, latest app-consistent, or custom) and creates a VM in the target region using the replicated disks. The VM is started, and you can then connect to it. For planned failovers (no data loss), ASR first flushes any pending writes before creating the VM. For unplanned failovers, you may experience some data loss depending on the chosen recovery point.
Recovery Plans: A recovery plan is a collection of VMs grouped together for failover. You can specify the order in which VMs start, add delays, and include Azure Automation runbooks or scripts for custom actions (e.g., updating DNS, connecting load balancers). Recovery plans ensure that multi-tier applications (e.g., web, app, database) come up in the correct order.
Failback: After the primary site is restored, you can fail back. For Azure-to-Azure, failback is a reverse replication: you enable replication from the target region back to the source. For on-premises, you need to set up a process server in Azure to replicate data back to on-premises.
Key Components, Values, Defaults, and Timers
- Replication Policy: Defines recovery point retention (default 24 hours, max 72 hours), app-consistent snapshot frequency (default 60 minutes), and crash-consistent snapshot frequency (every 5 minutes, not configurable). - Cache Storage Account: A standard storage account in the source region that temporarily holds replicated data. It must be in the same region as the source VM. - Target Resource Group: The resource group where failover VMs are created. Must be in the target region. - Target Virtual Network: The VNet in the target region where failover VMs will connect. You can pre-create it or let ASR create a default one. - Mobility Service: Installed automatically on Azure VMs; for on-premises, you must install it manually or via push installation. - Recovery Services Vault: The Azure resource that stores replication settings, recovery plans, and monitors replication health. It must be in the target region for Azure-to-Azure replication. - Failover Types: - Test Failover: Non-disruptive drill that creates VMs in an isolated network (or a specified test VNet) without affecting production. - Planned Failover: For expected outages; zero data loss by flushing pending writes. - Unplanned Failover: For unexpected disasters; may lose data up to the last recovery point. - RPO: Typically 5 minutes for crash-consistent, 60 minutes for app-consistent. - RTO: Depends on application startup time; ASR typically achieves RTO of minutes to a few hours.
Configuration and Verification Commands
To enable replication for an Azure VM using Azure PowerShell:
# Set vault context
$vault = Get-AzRecoveryServicesVault -ResourceGroupName "RG-Vault" -Name "ASRVault"
Set-AzRecoveryServicesAsrVaultContext -Vault $vault
# Get the VM to replicate
$vm = Get-AzVM -ResourceGroupName "RG-Prod" -Name "WebVM1"
# Create replication policy
$policy = New-AzRecoveryServicesAsrPolicy -Name "ReplicationPolicy" -ReplicationProvider "HyperVReplicaAzure" -ReplicationFrequencyInSeconds 300 -RecoveryPoints 24 -ApplicationConsistentSnapshotFrequencyInHours 1
# Enable replication
$job = New-AzRecoveryServicesAsrReplicationProtectedItem -VM $vm -Policy $policy -RecoveryResourceGroupId "/subscriptions/.../resourceGroups/RG-DR" -RecoveryAzureNetworkId "/subscriptions/.../virtualNetworks/VNet-DR"To verify replication health:
$protectedItems = Get-AzRecoveryServicesAsrReplicationProtectedItem -ProtectionContainer $container
$protectedItems | Select-Object FriendlyName, ProtectionHealth, ReplicationHealth, CurrentRecoveryPointIdInteraction with Related Technologies
Azure Backup: ASR is not a backup solution. Use Azure Backup for long-term retention of backups. ASR is for replication and failover with short RPO.
Azure Traffic Manager: Can be used to automatically redirect traffic to the failover region after failover by updating endpoint status.
Azure DNS: You can update DNS records manually or via automation runbooks to point to the failover region’s public IP.
Azure Automation: Runbooks in recovery plans can automate tasks like updating DNS, connecting to load balancers, or running application-specific scripts.
Azure Load Balancer: Pre-create a load balancer in the target region and configure it in the recovery plan to distribute traffic to failover VMs.
Azure Key Vault: ASR can replicate encryption keys if VMs are encrypted with Azure Disk Encryption.
Exam Trap Patterns
Trap 1: Confusing Azure Backup with ASR. Azure Backup is for backups; ASR is for replication and failover. The exam may ask which service to use for disaster recovery under a specific RPO/RTO requirement.
Trap 2: Thinking ASR requires a VPN or ExpressRoute for Azure-to-Azure replication. It does not; replication traffic goes over the Azure backbone network internally.
Trap 3: Assuming you can failover to any region. You must choose a paired region or a region that supports ASR. Not all regions are paired.
Trap 4: Forgetting that the Recovery Services Vault must be in the target region for Azure-to-Azure replication. A common wrong answer is placing the vault in the source region.
Trap 5: Believing that test failover uses the same network as production. Test failover creates VMs in an isolated network (or a specified test VNet) to avoid impact.
Enable Replication for Azure VM
In the Azure portal, navigate to the Recovery Services Vault, click 'Site Recovery' under 'Manage', then 'Enable Replication' for Azure VMs. Select the source region, resource group, and the VM to protect. Choose the target region (must be a paired region or a region supporting ASR). Configure the target resource group, virtual network, and storage account (cache). Specify a replication policy (default: 5 min crash-consistent, 60 min app-consistent, 24 hr retention). ASR then installs the Mobility Service extension on the VM and begins initial replication. During initial replication, the entire disk is copied to the target region, which may take time depending on disk size and network bandwidth. After initial sync, continuous delta replication begins.
Monitor Replication Health
After enabling replication, you can monitor the health in the Recovery Services Vault under 'Replicated Items'. The 'Protection Health' status can be 'Healthy', 'Warning', or 'Critical'. 'Replication Health' shows the status of the replication pipeline. Key metrics include 'Recovery Point Objective' (current lag), 'Last Successful Replication Time', and 'Data Transfer Rate'. If replication fails, check for network connectivity issues, disk write errors, or the Mobility Service not running. You can also view the replication status using Azure Monitor alerts. The exam expects you to know how to interpret these health states and what actions to take for warnings (e.g., check network, restart Mobility Service).
Create a Recovery Plan
A recovery plan groups VMs for orchestrated failover. In the vault, go to 'Recovery Plans' and create a new plan. Add the replicated VMs and specify the order (e.g., database tier first, then app tier, then web tier). You can add pre-actions and post-actions using Azure Automation runbooks or scripts. For example, a pre-action runbook can update DNS records, a post-action can connect VMs to a load balancer. You can also add manual actions for approvals. The recovery plan can be tested using test failover. During an actual failover, the plan executes the actions in order. The exam tests understanding of recovery plan use cases: multi-tier applications, automated DNS updates, and load balancer integration.
Perform a Test Failover
To validate your DR strategy without impacting production, perform a test failover. In the vault, select the recovery plan or individual replicated item, click 'Test Failover'. Choose a recovery point (latest, latest app-consistent, or custom). Specify a test virtual network (isolated from production) or let ASR create a default one. ASR then creates VMs in the target region using the replicated disks. These VMs are isolated in the test network. You can verify application functionality by connecting to them via private IPs. After testing, clean up by deleting the test VMs and test network. Important: Test failover does not affect the source VMs or ongoing replication. The exam may ask about the purpose of test failover and the requirement for an isolated network.
Execute an Unplanned Failover
During a disaster, initiate an unplanned failover. In the vault, select the recovery plan or replicated item, click 'Failover'. Choose a recovery point (latest, latest app-consistent, or custom). For unplanned failover, you cannot guarantee zero data loss; the latest recovery point may have a few seconds of data loss. ASR then creates VMs in the target region, starts them, and you can connect. After failover, you must commit the failover to finalize it (this locks the recovery point). The source VMs may still be running; you should stop them to avoid data conflicts. The exam tests the difference between planned and unplanned failover: planned failover has zero data loss, unplanned may have data loss. Also, you must commit after unplanned failover.
Perform Failback to Primary Site
After the primary site is restored, you can fail back. For Azure-to-Azure, you need to reverse replication: enable replication on the failed-over VMs from the target region back to the source. This is done by selecting the VM in the vault and clicking 'Re-protect'. This starts reverse replication (initial sync again). Once replicated, you can perform a failover to the source region (planned or unplanned). For on-premises, failback requires a process server in Azure and a master target server on-premises. The exam expects you to know that failback is not automatic and requires re-protection. Also, failback may incur costs due to data transfer.
Enterprise Scenario 1: Multi-Tier E-Commerce Application
A global e-commerce company runs its web, app, and database tiers on Azure VMs in the West US region. The business requires an RPO of 5 minutes and an RTO of 1 hour for the entire application. They use ASR with a recovery plan that starts the database tier first (SQL Server Always On), then the app tier, then the web tier. The recovery plan includes an Azure Automation runbook that updates Azure Traffic Manager endpoints to point to the failover region (East US) after the web tier is ready. They perform test failovers monthly using a separate test VNet to validate the process. A common issue they encountered: the app-consistent snapshot frequency was set to 60 minutes, but the database required app-consistent snapshots every 30 minutes to meet RPO. They adjusted the policy accordingly. Another issue: during an actual failover, the web VMs failed to start because they depended on a load balancer that wasn't pre-created in the target region. They now include a runbook to create the load balancer and update backend pools during failover.
Enterprise Scenario 2: Hybrid Disaster Recovery for On-Premises VMware
A financial services firm has on-premises VMware VMs running critical trading applications. They replicate these VMs to Azure using ASR with a process server on-premises that caches data and sends it to Azure. The target is a Recovery Services Vault in the East US region. They have a replication policy with 15-minute crash-consistent snapshots and 2-hour app-consistent snapshots. During a regional outage, they performed an unplanned failover to Azure. The failover worked, but the VMs had different IP addresses, causing connectivity issues with on-premises systems. They solved this by using a site-to-site VPN and configuring Azure VMs with the same IP addresses as on-premises (via custom IP settings in ASR). For failback, they re-protect the Azure VMs and replicate back to on-premises using a process server in Azure and a master target server on-premises. The failback process took 8 hours due to large disk sizes; they now use Azure ExpressRoute to speed up data transfer.
Common Misconfigurations
Placing the Recovery Services Vault in the source region: For Azure-to-Azure replication, the vault must be in the target region. If placed in the source region, replication fails.
Not pre-creating the target VNet: ASR can create a default VNet, but it may not have the correct address space or subnets. Always pre-create the target VNet to match production.
Using the same storage account for cache and target: The cache storage account must be in the source region, and target managed disks are separate. Using the same account can cause throttling.
Forgetting to clean up test failover resources: Test failover VMs and networks left behind incur costs and may cause confusion. Always clean up after testing.
Not updating DNS records after failover: Without DNS updates, traffic continues to go to the failed primary site. Use Automation runbooks to update DNS automatically.
What AZ-104 Tests on Azure Site Recovery (Objective 5.2)
The AZ-104 exam covers ASR under 'Monitor and Maintain Azure Resources' (Objective 5.2: Implement backup and recovery). Specific sub-objectives include: configure and manage Azure Site Recovery, perform failover and failback, create and manage recovery plans, and monitor replication health. You must understand the differences between Azure Backup and Azure Site Recovery, and when to use each. The exam expects you to know the steps to enable replication, the types of failover (test, planned, unplanned), and the components involved (Recovery Services Vault, replication policy, mobility service).
Common Wrong Answers and Why Candidates Choose Them
Wrong Answer 1: 'To replicate Azure VMs to another region, you must use Azure Backup.' Candidates confuse backup with DR. Correct: ASR is for replication and failover; Azure Backup is for long-term retention.
Wrong Answer 2: 'The Recovery Services Vault must be in the source region.' This is a common trap. For Azure-to-Azure replication, the vault must be in the target region. Candidates often think the vault stores replication data, but it only stores configuration; the actual data is in the target region.
Wrong Answer 3: 'Test failover uses the same production network.' Candidates may think testing is done in production. In reality, test failover must use an isolated network to avoid impact.
Wrong Answer 4: 'Planned failover can be used for unexpected disasters.' Planned failover is for expected outages (e.g., maintenance) and has zero data loss. For unexpected disasters, use unplanned failover.
Specific Numbers, Values, and Terms That Appear on the Exam
Default crash-consistent snapshot frequency: every 5 minutes (not configurable).
Default app-consistent snapshot frequency: every 60 minutes (configurable).
Maximum retention for recovery points: 72 hours (default 24 hours).
Recovery Services Vault must be in the target region for Azure-to-Azure replication.
Cache storage account must be in the source region.
Mobility Service is automatically installed on Azure VMs; for on-premises, it must be installed manually or via push.
Failover types: Test, Planned, Unplanned.
After unplanned failover, you must Commit to finalize.
For failback, you must re-protect the VM (reverse replication).
Edge Cases and Exceptions
Replication of encrypted VMs: If the source VM uses Azure Disk Encryption (ADE), the vault must have access to the Key Vault in the source region. ASR replicates the encryption keys along with the disks.
Replication of VMs with managed disks: ASR supports both managed and unmanaged disks. For managed disks, the target will also be managed disks.
Replication across regions that are not paired: You can replicate to any region that supports ASR, but using a paired region is recommended for lower latency and data residency.
Failover of VMs with public IPs: ASR does not automatically assign a public IP to the failover VM. You must pre-create a public IP in the target region and associate it via a runbook or manual action.
How to Eliminate Wrong Answers Using the Underlying Mechanism
When faced with an ASR question, ask yourself: 'What is the core function?' If the question is about replication for DR, eliminate any answer mentioning backup, long-term retention, or vault in the source region. If the question is about failover type, remember: planned = zero data loss, unplanned = possible data loss, test = isolated network. If the question is about recovery plan, look for order of VMs and automation. By understanding the mechanism, you can quickly eliminate options that violate the design principles.
Azure Site Recovery replicates Azure VMs and on-premises machines to a secondary region for disaster recovery.
The Recovery Services Vault must be in the target region for Azure-to-Azure replication.
Default crash-consistent snapshot frequency is every 5 minutes; app-consistent is every 60 minutes.
Test failover uses an isolated network and does not affect production.
Planned failover has zero data loss; unplanned failover may have data loss.
After an unplanned failover, you must commit to finalize.
For failback, you must re-protect the VM (reverse replication).
Recovery plans orchestrate multi-tier application failover with runbooks and scripts.
These come up on the exam all the time. Here's how to tell them apart.
Azure Site Recovery
Purpose: Disaster recovery with failover and failback.
RPO: Seconds to minutes (near-synchronous).
Retention: Up to 72 hours (default 24 hours).
Replication: Continuous block-level replication.
Failover: Orchestrated failover with recovery plans.
Azure Backup
Purpose: Long-term backup and archival.
RPO: Typically daily (configurable).
Retention: Up to 99 years (configurable).
Replication: Snapshot-based (full + incremental).
Restore: Point-in-time restore to original or alternate location.
Mistake
Azure Site Recovery is the same as Azure Backup.
Correct
ASR is a disaster recovery service that replicates VMs for failover, with RPO of seconds to minutes. Azure Backup is a backup service for long-term retention with daily/weekly/monthly backups. They serve different purposes and can be used together.
Mistake
The Recovery Services Vault must be in the source region for Azure-to-Azure replication.
Correct
The vault must be in the target region. The vault stores configuration and monitoring data, not the replicated data. The replicated data is stored in the target region as managed disks.
Mistake
Test failover uses the same production network and can impact production.
Correct
Test failover creates VMs in an isolated network (or a specified test VNet) to avoid any impact on production. After testing, you must clean up the test VMs and network.
Mistake
You can failover to any Azure region regardless of pairing.
Correct
ASR supports replication to any region that has ASR enabled, but using paired regions is recommended for lower latency and data residency. Not all regions are paired, but you can still replicate to non-paired regions if supported.
Mistake
After an unplanned failover, the failover is automatically committed.
Correct
After an unplanned failover, you must manually commit the failover to finalize it. Until committed, you can revert to a different recovery point. Committing locks the recovery point and stops further changes.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure Site Recovery is a disaster recovery service that replicates VMs for failover with a short RPO (seconds to minutes). It is used for business continuity during outages. Azure Backup is a backup service for long-term retention of data (up to 99 years) with daily/weekly/monthly backups. Use ASR for DR and Azure Backup for archival. They can be used together: ASR for rapid failover, Backup for long-term retention.
Yes, you can replicate to any Azure region that supports ASR. However, it is recommended to use a paired region for lower latency and data residency compliance. Not all regions are paired, but you can still replicate to non-paired regions if the target region supports ASR. Check the Azure documentation for supported region pairs.
A recovery plan is a collection of VMs grouped together for failover. You can specify the order in which VMs start (e.g., database first, then app, then web). You can add pre-actions and post-actions using Azure Automation runbooks or scripts to automate tasks like updating DNS or connecting to load balancers. Recovery plans ensure consistent and automated failover of multi-tier applications.
In the Recovery Services Vault, select the recovery plan or replicated item and click 'Test Failover'. Choose a recovery point and specify a test virtual network that is isolated from your production network (or let ASR create a default isolated network). ASR creates VMs in the target region in that test network. After testing, clean up by deleting the test VMs and network. No production traffic is affected.
Choosing 'Latest' will use the most recent crash-consistent recovery point. This minimizes data loss but may have a few seconds of latency. If you choose 'Latest app-consistent', you get the most recent application-consistent snapshot, which may be older (up to 60 minutes by default) but ensures application integrity. The trade-off is between data loss and consistency.
No, for Azure VMs, the Mobility Service is automatically installed as an extension when you enable replication. For on-premises VMs (Hyper-V or VMware), you must install the Mobility Service manually or via push installation from the Configuration Server.
After the primary site is restored, you need to reverse replication. In the vault, select the failed-over VM and click 'Re-protect'. This starts replicating from the target region back to the source. Once replication is healthy, you can perform a failover to the source region (planned or unplanned). For on-premises, you need a process server in Azure and a master target server on-premises.
You've just covered Azure Site Recovery — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.
Done with this chapter?