AZ-900Chapter 104 of 127Objective 2.1

Azure Site Recovery (Disaster Recovery)

This chapter covers Azure Site Recovery (ASR), a key disaster recovery service within Azure's architecture. Understanding ASR is critical for the AZ-900 exam, as it appears under Domain 2 (Azure Architecture and Services), Objective 2.1 (Describe core Azure architectural components). This objective area carries approximately 15-20% of the exam weight. You'll learn how ASR protects businesses from downtime, how it works under the hood, and exactly what the exam expects you to know about failover, replication, and recovery plans.

25 min read
Intermediate
Updated May 31, 2026

The Hotel Backup Plan for Disaster

Imagine you manage a busy downtown hotel. Your hotel is your primary site—guests (customers) are checking in, staff are working, and everything runs smoothly. But what if a fire, flood, or power outage forces you to close? You can't just tell guests to leave; you need a backup plan. Azure Site Recovery (ASR) is like having a fully staffed, identical hotel (your secondary region) ready to take over in minutes. You don't build it from scratch when disaster strikes—you pre-configure it with the same room layouts, booking systems, and staff assignments. During normal operations, you continuously replicate guest data and room statuses (your VMs and workloads) to the backup hotel. If disaster hits, you 'fail over'—guests are redirected to the backup hotel, and operations resume with minimal disruption. After the crisis, you can 'fail back' guests to the original hotel once it's repaired. ASR automates this entire process, ensuring your business stays open even when your primary site is down. The mechanism mirrors Azure's orchestration: continuous replication, health monitoring, automated failover, and seamless failback, all managed from a single control pane.

How It Actually Works

What is Azure Site Recovery and Why Does It Matter?

Azure Site Recovery (ASR) is a cloud-based disaster recovery (DR) service that orchestrates replication, failover, and recovery of workloads running on Azure VMs, on-premises Hyper-V VMs, VMware VMs, and physical servers. The business problem it solves is simple: unplanned downtime can cost companies millions per hour. ASR ensures that if your primary site goes down (due to natural disaster, hardware failure, or human error), you can quickly switch to a secondary site with minimal data loss and recovery time.

For AZ-900, you need to understand ASR as a PaaS (Platform as a Service) offering that integrates with Azure's infrastructure. Microsoft guarantees a Recovery Time Objective (RTO) of minutes and a Recovery Point Objective (RPO) of seconds when using ASR with Azure, though actual values depend on configuration and network conditions.

How Azure Site Recovery Works – Step by Step Mechanism

ASR operates through a cycle of four phases: Replication, Testing, Failover, and Failback. Let's walk through each:

1.

Replication: You define what to protect (VMs, databases, etc.) and where to replicate it (a secondary Azure region or your own on-premises site). ASR continuously copies data from the source to the target using change tracking and block-level replication. For Azure VMs, this uses the Azure Site Recovery Mobility service agent installed on each VM. For on-premises machines, you install the agent or use Hyper-V/VMware integration. Replication is near-continuous, with a default RPO as low as 30 seconds.

2.

Testing: Before a real disaster, you can run a test failover in an isolated environment. This validates that your recovery plan works without impacting production. ASR creates a copy of your replicated VMs in the target region using the latest consistent data. You can verify application functionality and network connectivity, then clean up the test environment.

3.

Failover: When disaster strikes, you initiate a failover. ASR spins up the replicated VMs in the target region using the most recent recovery point (which can be the latest, a specific point in time, or a custom snapshot). The process is automated and can be triggered via Azure portal, PowerShell, CLI, or REST API. During failover, ASR handles IP address changes, DNS updates, and network routing to minimize disruption.

4.

Failback: After the primary site is restored, you can fail back the workloads. This involves reversing replication from the target to the source, then failing over again to return to the original site. Failback requires careful planning to avoid data loss or inconsistency.

Key Components and Tiers

Recovery Services Vault: The central management container for ASR. It stores replication settings, recovery plans, and monitors health. You create one vault per subscription per region.

Replication Policy: Defines RPO threshold, recovery point retention (how long to keep older points), and app-consistent snapshot frequency. Default: RPO threshold of 15 minutes, retention of 24 hours.

Recovery Plan: A collection of VMs and scripts that define the order and dependencies for failover. For example, you might start the database tier before the web tier. Plans can include manual actions or automation runbooks.

Mobility Service Agent: Installed on each VM to enable replication. For Windows, it's a Windows service; for Linux, a daemon. It captures disk writes and sends them to the target.

Process Server: For on-premises VMware/physical replication, this component caches and compresses data before sending to Azure. It also installs the Mobility service.

Configuration Server: Coordinates communication between on-premises and Azure. It manages replication, discovers machines, and tracks settings.

Pricing: ASR charges per protected instance (VM) per month, plus storage costs for replicated data in the target region. There is no charge for test failovers or for the first 31 days of replication for new protections.

Comparison to On-Premises Disaster Recovery

Traditional on-premises DR requires a secondary data center with duplicate hardware, software licenses, and IT staff. You must maintain idle capacity, run regular tests, and manage complex replication tools. ASR eliminates capital expenditure (no secondary site needed), reduces operational overhead (automated failover), and offers pay-as-you-go pricing. However, on-premises DR may provide lower latency if the secondary site is local, whereas ASR relies on Azure's global network.

Azure Portal and CLI Touchpoints

In the Azure portal, you manage ASR under the Recovery Services Vault resource. Key blades: Site Recovery (for replication and failover), Backup (for backup, not DR), and Monitoring (for health alerts).

Using Azure CLI, you can create a vault, enable replication, and trigger failover. Example commands:

# Create a Recovery Services Vault
az backup vault create --resource-group MyRG --name MyVault --location eastus

# Enable replication for an Azure VM
az site-recovery protection-container mapping create ...

# Trigger a failover
az site-recovery job restart --name FailoverJob --resource-group MyRG --vault-name MyVault

PowerShell cmdlets follow similar patterns. The exam does not test command syntax, but you should recognize that ASR can be managed programmatically.

Concrete Business Scenarios

Scenario 1: A retail company runs its e-commerce platform on Azure VMs in West US. To protect against region-wide outages, they configure ASR to replicate VMs to East US. During a failover test, they discover that their database VMs need to start before web VMs—so they create a recovery plan with dependencies. Real failover takes 8 minutes, with RPO of 30 seconds.

Scenario 2: A hospital uses on-premises Hyper-V for patient records. They replicate to Azure using ASR. When a ransomware attack encrypts on-premises data, they fail over to Azure with 15-minute RPO, avoiding data loss. After cleanup, they fail back.

Scenario 3: A gaming company runs VMSS (Virtual Machine Scale Sets) for game servers. They protect the scale set with ASR, but must ensure that after failover, the load balancer redirects traffic. They use recovery plans to update DNS records.

Common Misconfiguration Pitfalls

Not configuring network settings correctly leads to unreachable VMs after failover.

Forgetting to clean up test failovers incurs storage costs.

Overlooking app-consistent snapshots causes database corruption on failover.

Using the wrong replication policy (e.g., too short retention) can result in no viable recovery point.

Walk-Through

1

Create a Recovery Services Vault

First, you need a Recovery Services Vault to store replication data and manage settings. In the Azure portal, search for 'Recovery Services Vault' and click Create. Choose a resource group, vault name, and region (this is the management region, not necessarily the target). The vault must be in the same region as the source or target? Actually, the vault can be in any region, but best practice is to place it in the target region. For AZ-900, know that the vault is a container for both Backup and Site Recovery services. After creation, you configure replication settings under 'Site Recovery' blade.

2

Enable Replication for Azure VMs

In the vault, select 'Site Recovery' then 'Enable Replication'. Choose the source (Azure region) and target region. You can select one or more VMs from the same region. For each VM, ASR installs the Mobility service agent automatically (if not present). You also configure a replication policy: set RPO threshold (default 15 min), retention (24 hours), and app-consistent snapshot frequency (default 1 hour). Behind the scenes, ASR creates a cache storage account in the source region and a managed disk in the target region. Replication begins immediately. The initial replication may take hours depending on data size.

3

Create a Recovery Plan

A recovery plan groups VMs and defines the failover sequence. For example, a multi-tier app: first failover database VMs, then application VMs, then web VMs. To create one, go to 'Site Recovery' > 'Recovery Plans' > 'Create Recovery Plan'. Name it, select source and target, then add VMs from your replicated list. You can reorder groups and add pre/post-actions (e.g., run a script to update DNS). Recovery plans are optional but recommended for complex apps. The exam tests that recovery plans automate failover order and can include manual steps or Azure Automation runbooks.

4

Run a Test Failover

Before a real disaster, test your recovery plan. In the vault, select the recovery plan, then 'Test Failover'. Choose a recovery point (latest, latest processed, or custom). ASR creates a test network (isolated from production) and spins up VMs in the target region using replicated data. You can validate connectivity and application behavior. After testing, clean up by selecting 'Cleanup test failover'. Important: test failover does not impact production; VMs are created with a suffix '-test'. The exam expects you to know that test failover is for validation and should be done regularly.

5

Initiate Failover During Disaster

When an outage occurs, trigger a failover. Go to the recovery plan or individual VM and select 'Failover'. Choose a recovery point: 'Latest' (lowest RPO), 'Latest processed' (crash-consistent), 'Latest app-consistent' (app-consistent), or 'Custom' (a specific point). ASR then creates VMs in the target region using the chosen point. It also applies network settings (IP addresses, NSGs) as configured. The failover can be 'planned' (for expected downtime, no data loss) or 'unplanned' (for disasters, may lose recent changes). After VMs are running, you commit the failover to finalize. For AZ-900, know that failover is a manual or automated process that shifts operations to the secondary site.

6

Perform Failback After Recovery

Once the primary site is restored, you can fail back. First, reverse replication: configure the target region to replicate back to the source. Then run a planned failover (since you want zero data loss) to move workloads back. Finally, commit and clean up. Failback is more complex and requires careful planning. The exam may ask that failback is possible but not automatic; you must reconfigure replication direction. Also, note that failback is not supported for all scenarios (e.g., physical servers to Azure cannot fail back to physical; you must convert to Hyper-V or VMware).

What This Looks Like on the Job

Scenario 1: E-commerce Platform During a Regional Outage

An online retailer runs its website and inventory database on Azure VMs in the 'West US' region. They use ASR to replicate to 'East US'. During a severe storm that takes down West US data centers, the operations team triggers an unplanned failover. Within 10 minutes, all VMs are running in East US with an RPO of 30 seconds (meaning they lost at most 30 seconds of recent transactions). The recovery plan includes a script to update DNS records to point to the new load balancer IP. The business avoids hours of downtime, saving an estimated $500,000 in lost sales. Cost: They pay for the replicated managed disks (about $200/month per VM) plus the ASR instance fee ($15/month per VM). The key lesson: regular test failovers (quarterly) ensured the recovery plan worked flawlessly.

Scenario 2: Healthcare Provider Ransomware Attack

A hospital uses on-premises Hyper-V for patient records and replicates to Azure via ASR. A ransomware attack encrypts all on-premises data. The IT team initiates an unplanned failover to Azure. Because ASR had app-consistent snapshots every 15 minutes, they recover to a point just before the attack, losing only 10 minutes of data. The hospital continues operations from Azure while the on-premises environment is cleaned. After two weeks, they fail back using a planned failover (zero data loss). Problem: They had not tested failback, causing a 2-day delay due to IP conflicts. Lesson: Always test failback as well. Cost: ASR replication costs were $300/month for 10 VMs; the avoided ransom demand was $2 million.

Scenario 3: Multi-Tier Application with Dependencies

A financial services company runs a three-tier app (web, app, database) on Azure VMs. They configure ASR with a recovery plan that specifies order: database first, then app servers, then web servers. Each group has a 5-minute delay between starts to allow services to initialize. During a failover test, they discover that the database VM takes longer to boot, causing the app servers to fail connecting. They adjust the recovery plan to add a 10-minute pause after the database group. Real failover during a hardware failure works smoothly. The exam relevance: recovery plans are a key feature of ASR that candidates must understand.

How AZ-900 Actually Tests This

Exam Objective: 2.1 Describe core Azure architectural components – specifically, Azure Site Recovery under 'Disaster Recovery'. The exam tests your ability to distinguish ASR from Azure Backup, understand its purpose (replication and failover), and identify key features like RPO, RTO, and recovery plans.

Common Wrong Answers and Why Candidates Choose Them: 1. 'Azure Site Recovery is for backing up data.' – Wrong because ASR is for disaster recovery (replication and failover), not backup. Backup is a separate service (Azure Backup) for long-term retention. Candidates confuse the two because both use Recovery Services Vault. 2. 'ASR only works for Azure VMs.' – Wrong because ASR also supports on-premises Hyper-V, VMware, and physical servers. The exam may list only Azure VMs as an option. 3. 'Failover is automatic without any manual intervention.' – Wrong because failover requires manual initiation (or an automated runbook), but it is not automatic by default. Candidates think 'automatic' means without human action. 4. 'RPO and RTO are guaranteed by Microsoft regardless of configuration.' – Wrong because RPO/RTO depend on factors like network bandwidth, data size, and replication policy. Microsoft provides typical values (seconds for RPO, minutes for RTO) but no SLA for RPO/RTO.

Specific Terms and Values That Appear Verbatim on the Exam: - Recovery Services Vault, replication policy, failover, failback, test failover, recovery plan. - RPO: as low as 30 seconds (for Azure VMs). RTO: typically minutes. - Supported sources: Azure VMs, Hyper-V (on-prem), VMware VMs, physical servers. - Target: Azure region or on-premises (for failback).

Edge Cases and Tricky Distinctions: - ASR vs. Azure Backup: ASR replicates entire VMs for DR; Backup stores point-in-time copies for long-term retention. Both can be in the same vault but serve different purposes. - ASR does not provide backup for individual files; it's for full VM or workload recovery. - Test failover does not impact production; it creates an isolated network. - Failback is only possible if you have a reverse replication setup.

Memory Trick: 'DR = Replicate + Failover' - D (Disaster): unexpected event. - R (Recovery): get back online. - ASR does replication (copy) and failover (switch). - If the question mentions 'backup' or 'long-term retention', it's not ASR. - If it mentions 'replication across regions' and 'failover', it's ASR.

Decision Tree for Eliminating Wrong Answers: 1. Is the scenario about protecting against a disaster (region outage, hardware failure)? → ASR. 2. Is it about restoring a deleted file or older version? → Azure Backup. 3. Does it mention 'recovery plan' or 'test failover'? → ASR. 4. Does it mention 'vault' but no failover? → could be either; look for 'replication' vs 'backup policy'.

Key Takeaways

Azure Site Recovery (ASR) is a disaster recovery service that replicates workloads to a secondary region for failover during outages.

ASR supports Azure VMs, on-premises Hyper-V, VMware, and physical servers.

Key components: Recovery Services Vault, replication policy, recovery plan, Mobility service agent.

RPO can be as low as 30 seconds; RTO is typically minutes.

Failover can be planned (zero data loss) or unplanned (may lose recent changes).

Test failover validates recovery without impacting production.

Failback is possible after the primary site is restored, but requires reverse replication.

ASR is not a backup service; it is for business continuity during disasters.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure Site Recovery (ASR)

Purpose: Disaster recovery – replicate and failover VMs/workloads across regions or to Azure.

RPO: As low as 30 seconds (near-continuous replication).

RTO: Minutes (failover time).

Replicates entire VMs or workloads continuously.

Supports test failovers for validation.

Azure Backup

Purpose: Backup – long-term retention and point-in-time restore of data.

RPO: Typically 1 hour or more (scheduled backups).

RTO: Hours (restore time).

Stores snapshots or backups at scheduled intervals.

Does not support failover; restores to original or alternate location.

Watch Out for These

Mistake

Azure Site Recovery is the same as Azure Backup.

Correct

ASR is for disaster recovery (replication and failover) to minimize downtime, while Azure Backup is for long-term data retention and point-in-time recovery. Both use Recovery Services Vaults but have different objectives and features.

Mistake

ASR automatically fails over without any user action.

Correct

Failover must be initiated manually (or via an automated runbook) by an administrator. ASR does not automatically detect disasters and fail over; you trigger it when needed.

Mistake

ASR can only replicate Azure VMs, not on-premises machines.

Correct

ASR supports replication of on-premises Hyper-V VMs, VMware VMs, and physical servers to Azure, as well as Azure VMs between regions.

Mistake

Microsoft guarantees a specific RPO and RTO for all ASR deployments.

Correct

Microsoft states typical RPO of seconds and RTO of minutes, but actual values depend on configuration, network, and workload. No SLA is provided for RPO/RTO.

Mistake

Test failover affects production workloads.

Correct

Test failover creates an isolated network and does not impact production. It uses replicated data without disrupting ongoing replication.

Frequently Asked Questions

What is the difference between Azure Site Recovery and Azure Backup?

Azure Site Recovery (ASR) is for disaster recovery: it continuously replicates VMs and workloads to a secondary region so you can failover quickly during an outage. Azure Backup is for long-term data retention: it creates scheduled backups that you can restore to a specific point in time. Both use Recovery Services Vaults, but ASR focuses on minimizing downtime (RTO minutes, RPO seconds), while Backup focuses on data durability (RTO hours, RPO hours). On the exam, if the question mentions 'failover' or 'replication across regions', it's ASR. If it mentions 'restore a file from last week', it's Backup.

Can Azure Site Recovery protect on-premises servers?

Yes, ASR can protect on-premises Hyper-V VMs, VMware VMs, and physical servers by replicating them to Azure. You install the Mobility service agent on each machine and use a Configuration Server and Process Server (for VMware/physical) to manage replication. This allows you to failover to Azure if your on-premises site goes down. After recovery, you can fail back. The exam may test that ASR supports hybrid scenarios.

What is a recovery plan in Azure Site Recovery?

A recovery plan is a collection of VMs and automation steps that define the order and dependencies for failover. For example, a multi-tier app may require the database tier to start before the web tier. You can group VMs, set delays, and add scripts or Azure Automation runbooks. Recovery plans ensure a consistent and predictable failover process. The exam expects you to know that recovery plans are optional but recommended for complex applications.

What is the difference between planned and unplanned failover?

Planned failover is used when you expect downtime (e.g., maintenance) and want zero data loss. It shuts down VMs cleanly before replicating final changes. Unplanned failover is for disasters and may result in some data loss (depending on RPO). ASR supports both types. On the exam, remember that planned failover requires a clean shutdown, while unplanned does not.

How does ASR handle network changes during failover?

During failover, ASR can automatically assign new IP addresses from the target network or retain the original IPs if you use site-to-site VPN or ExpressRoute. You can configure network settings in the recovery plan or VM properties. ASR also updates DNS records if you have automation. The exam may test that you need to plan for IP address changes to avoid connectivity issues.

What is the cost of using Azure Site Recovery?

ASR charges per protected instance (VM) per month, plus storage costs for replicated data in the target region. There is no charge for test failovers or for the first 31 days of replication for new protections. Pricing depends on the number of VMs and the amount of data. For AZ-900, know that ASR is a paid service with a pay-as-you-go model.

Can I use Azure Site Recovery to migrate workloads to Azure?

Yes, ASR can be used for migration (lift-and-shift) by replicating on-premises VMs to Azure and then failing over. This is a common use case. However, for migration, you typically use Azure Migrate, which includes ASR as an option. The exam may mention that ASR supports migration scenarios.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure Site Recovery (Disaster Recovery) — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Done with this chapter?