AZ-104Chapter 74 of 168Objective 2.3

Storage Account Failover and RA-GZRS

This chapter covers storage account failover for RA-GZRS (Read-Access Geo-Zone-Redundant Storage), a critical disaster recovery feature in Azure. It explains the mechanism of customer-managed failover, including the process, data loss window, and limitations. For the AZ-104 exam, questions on this topic test your understanding of when to use failover versus other replication options, the recovery point objective (RPO), and the recovery time objective (RTO). Expect 2-3 questions on storage replication and failover scenarios.

25 min read
Intermediate
Updated May 31, 2026

Azure Storage Failover: A Bank's Dual Vault System

Imagine a bank with two vaults: a primary vault in New York and a secondary vault in Chicago. The bank's policy is to keep a real-time, synchronous copy of all safety deposit boxes in the primary vault, but to maintain an asynchronous copy in Chicago that is updated every few minutes. Customers access their boxes only from the New York vault under normal conditions. If a fire breaks out in New York, the bank manager can initiate a failover to Chicago: the vault doors in New York are sealed, and all access requests are redirected to Chicago. However, because the Chicago copy was asynchronous, any deposits made in the last few minutes (up to 15 minutes) might be lost. Once New York is repaired, the bank must manually reverse the failover and synchronize data back. The bank cannot perform another failover for at least 12 months to ensure stability. This mirrors Azure's customer-managed failover for RA-GZRS: you have a primary region with synchronous writes, a secondary region with asynchronous updates (RA-GZRS provides read access to the secondary), and a failover that switches the secondary to become the new primary, with potential data loss of up to 15 minutes and a 12-month cooldown between failovers.

How It Actually Works

What is RA-GZRS and Why Failover Exists

RA-GZRS (Read-Access Geo-Zone-Redundant Storage) is an Azure Storage replication option that combines zone-redundancy within a primary region with geo-redundancy to a secondary region. Data is written synchronously across three Azure availability zones in the primary region, then asynchronously replicated to a single data center in a paired secondary region. The 'Read-Access' prefix means you can read data from the secondary region (even during normal operations), but writes are only allowed to the primary endpoint. This is different from GZRS (Geo-Zone-Redundant Storage) where the secondary is not readable until a failover occurs.

Customer-managed failover (also called 'planned failover' or 'unplanned failover' for disaster recovery) allows you to promote the secondary region to become the new primary. This is essential when the primary region experiences a prolonged outage. Without failover, your data would remain accessible only for reads (if RA-GZRS) but not for writes. After failover, the original secondary becomes the new primary, and the original primary becomes a secondary (but is not readable until a reverse failover is performed).

How Customer-Managed Failover Works Internally

When you initiate a failover via the Azure portal, PowerShell, or CLI, Azure performs the following steps:

1.

Validation: Azure checks that the storage account is in a healthy state and that a failover has not been performed within the last 12 months (cooldown period). The primary region must be experiencing an outage or you must accept potential data loss.

2.

DNS Update: Azure updates the DNS records for the storage account's blob, table, queue, and file endpoints. The primary endpoint (e.g., mystorageaccount.blob.core.windows.net) is repointed to the secondary region's IP addresses. This is done gradually across Azure's DNS infrastructure, so propagation may take a few minutes.

3.

Replication Stop: Azure stops the asynchronous replication from primary to secondary. Any data that was in transit but not yet committed to the secondary is lost. This is the data loss window, which is typically up to 15 minutes (based on the replication lag).

4.

Promotion: The secondary region's storage stamps are promoted to become the new primary. The storage account's properties are updated to reflect the new primary location.

5.

Read-Write Enablement: The new primary endpoint now accepts both read and write operations. The old primary region becomes a secondary, but it is not accessible for reads until a reverse failover is performed. The storage account is now in a 'failed over' state.

Key Components, Values, Defaults, and Timers

Replication Type: Must be RA-GZRS or GZRS (or their older counterparts RA-GRS/GRS). Failover is not supported for LRS, ZRS, or read-only geo-redundant storage (RA-GRS) without zone redundancy? Actually, failover is supported for GRS and RA-GRS as well, but the chapter focuses on RA-GZRS.

RPO (Recovery Point Objective): Up to 15 minutes. This is the maximum data loss you can expect. In practice, it is usually less, but Azure guarantees no more than 15 minutes.

RTO (Recovery Time Objective): Typically under 1 hour from the time you initiate the failover. This includes DNS propagation and promotion.

Cooldown Period: 12 months between failovers. Once you perform a failover, you cannot initiate another failover (even for the reverse direction) for 12 months. This is a hard limit.

Reverse Failover: Not automated. After the original primary region recovers, you must contact Microsoft Support to perform a reverse failover (or wait 12 months and do another customer-managed failover). This is a key exam point.

Unsupported Account Types: Storage accounts with hierarchical namespace (Azure Data Lake Storage Gen2) enabled do not support customer-managed failover. Also, accounts with Azure Files NFSv4.1 shares or premium block blob accounts do not support failover.

Configuration and Verification Commands

To initiate a failover using Azure CLI:

az storage account failover --name <storage-account-name> --resource-group <resource-group>

To check the failover status:

az storage account show --name <storage-account-name> --resource-group <resource-group> --query "statusOfPrimary"

The output will show available or unavailable. After failover, the primary location changes to the secondary region.

Using PowerShell:

Invoke-AzStorageAccountFailover -ResourceGroupName <resource-group> -Name <storage-account-name>

To verify replication status and last sync time:

Get-AzStorageAccount -ResourceGroupName <resource-group> -Name <storage-account-name> | Select-Object -Property PrimaryLocation, SecondaryLocation, StatusOfPrimary, StatusOfSecondary, LastGeoFailoverTime

How It Interacts with Related Technologies

Azure Site Recovery: For Azure VMs, you would use Site Recovery for failover, not storage account failover. Storage account failover is for data stored in Azure Blob, Table, Queue, or Azure Files (excluding premium and NFS).

Azure Traffic Manager: You can use Traffic Manager to route traffic to the secondary endpoint before initiating failover, but the storage account failover itself changes the DNS of the primary endpoint.

Azure Front Door: Similar to Traffic Manager, can be used for global load balancing, but storage account failover is a separate mechanism.

Soft Delete and Immutable Storage: Soft delete and immutable policies are preserved after failover. However, versioning and change feed may have considerations.

Important Exam Details

The failover is customer-managed, not automatic. You must initiate it.

You can only failover once every 12 months.

Data loss is up to 15 minutes.

After failover, the old primary becomes a secondary but is not accessible for reads until a reverse failover is performed (via support).

Failover is not supported for premium block blobs, Data Lake Gen2, or NFS shares.

The secondary region is determined by Azure's paired regions. You cannot choose an arbitrary region.

Common Misunderstandings

Some think failover is automatic. It is not; you must trigger it.

Some think you can failover multiple times quickly. The 12-month cooldown prevents this.

Some think after failover, you can read from the old primary. You cannot; it becomes a secondary and is not accessible until reverse failover.

Some think RA-GZRS allows writes to secondary. It does not; only reads are allowed until failover.

Walk-Through

1

Validate Account and Region Health

Before initiating failover, Azure validates that the storage account is eligible (e.g., replication type is RA-GZRS or GZRS, no hierarchical namespace, not premium, etc.). It also checks that the primary region is experiencing an outage or that you accept potential data loss. The account must not have undergone a failover in the last 12 months. If validation fails, the operation is rejected with an error message.

2

Stop Asynchronous Replication

Azure stops the asynchronous geo-replication from the primary to the secondary region. Any data that has been written to the primary but not yet replicated to the secondary is lost. This is the data loss window, typically up to 15 minutes. The last successful replication time is recorded. After this step, the secondary region has a point-in-time snapshot of the data.

3

Promote Secondary to Primary

Azure promotes the secondary region's storage stamps to become the new primary. The storage account's metadata is updated to reflect the new primary location. The old primary region is demoted to secondary. This promotion involves updating internal routing tables and storage stamp configurations.

4

Update DNS Endpoints

Azure updates the DNS records for the storage account's endpoints (blob, table, queue, file) to point to the new primary region's IP addresses. This propagation happens gradually across Azure's DNS infrastructure. Clients may experience a brief period where they are redirected to the old primary (which is now unavailable) until DNS caches expire. The TTL on these DNS records is typically 300 seconds (5 minutes).

5

Enable Read-Write on New Primary

The new primary endpoint is now set to accept both read and write operations. The storage account's status is updated to 'available' for the new primary location. The old primary region becomes a secondary, but it is not accessible for reads until a reverse failover is performed. The account enters a 'failed over' state.

What This Looks Like on the Job

Scenario 1: Regional Outage for a SaaS Provider

A SaaS company hosts customer data in Azure Blob Storage using RA-GZRS. Their primary region is East US, with secondary in West US. During a major hurricane, East US experiences a prolonged power outage. The company initiates a customer-managed failover to West US. They accept up to 15 minutes of data loss (which they mitigate by logging writes locally). After failover, their application continues to serve customers from West US. They contact Microsoft Support to plan a reverse failover once East US is restored. Key considerations: They must update any application configurations that hardcoded the storage endpoint (though DNS update should handle it). They also need to ensure that any dependent services (e.g., Azure Functions, CDN) are also failed over or redirected.

Scenario 2: Planned Failover for Maintenance

A financial services firm uses RA-GZRS for audit logs. They need to perform maintenance on their primary region (North Europe) and want to switch to secondary (West Europe) with minimal downtime. They initiate a planned failover during a maintenance window. Because they control the timing, they ensure no writes occur for 15 minutes before failover to avoid data loss. After failover, they perform maintenance on the original primary. However, they cannot fail back for 12 months unless they contact support. This limitation forces them to consider alternative strategies for regular maintenance, such as using Azure Traffic Manager to redirect traffic without changing storage endpoints.

Scenario 3: Misconfiguration Leading to Data Loss

A startup accidentally initiates a failover during a minor blip in the primary region, thinking it would be automatic. They lose 10 minutes of recent uploads because they did not wait for replication lag to clear. Worse, they realize that their storage account uses hierarchical namespace (Data Lake Gen2), which is not supported for failover — but the portal allowed them to initiate it? Actually, the portal would block it. However, they had a GPv2 account with RA-GZRS that supported failover. After failover, they cannot reverse it for 12 months. They must now operate from a region farther from their users, increasing latency. This highlights the importance of understanding the cooldown period and planning failovers carefully.

How AZ-104 Actually Tests This

What AZ-104 Tests on This Topic (Objective 2.3)

Replication options: You must know the differences between LRS, ZRS, GRS, RA-GRS, GZRS, RA-GZRS. Specifically, which support read-access to secondary, which support failover, and the durability numbers.

Failover scenarios: When to use customer-managed failover vs. other DR methods. The exam presents scenarios where a primary region is down and asks what to do.

Limitations: The 12-month cooldown, unsupported account types (premium, Data Lake, NFS), and the fact that reverse failover requires Microsoft Support.

RPO and RTO: Up to 15 minutes RPO, under 1 hour RTO.

Common Wrong Answers and Why Candidates Choose Them

1.

'Automatic failover': Many candidates think Azure automatically fails over when a region goes down. This is wrong; failover is customer-managed. The exam will have a distractor saying 'Automatic failover will occur within 1 hour.'

2.

'You can failover multiple times': Candidates may think you can failover back and forth. The 12-month cooldown prevents this. The exam might say 'Perform a failover and then fail back after the region recovers.' This is incorrect without support.

3.

'RA-GZRS allows writes to secondary': Read-access only. Candidates confuse RA-GZRS with active-active configurations. The exam may ask 'Can you write to the secondary endpoint?' Answer: No.

4.

'Failover is supported for all storage accounts': It is not supported for premium block blobs, Data Lake Gen2, or NFS shares. The exam may present a scenario with a premium account and ask for failover.

Specific Numbers and Terms on the Exam

15 minutes: Maximum data loss (RPO).

12 months: Cooldown between failovers.

1 hour: Typical RTO.

Paired regions: Secondary is determined by Azure, not user-selectable.

Customer-managed failover: Term used in the exam.

Reverse failover: Requires Microsoft Support.

Edge Cases the Exam Loves

Storage account with soft delete enabled: Failover preserves soft delete policies. Data that was soft-deleted before failover is still recoverable.

Storage account with versioning: Versions are preserved.

Failover during active geo-replication lag: Data loss up to 15 minutes.

Failover for Azure Files: Supported only for standard file shares, not premium.

How to Eliminate Wrong Answers

If the question mentions 'automatic' or 'unattended', it's likely wrong.

If the question mentions failing back quickly, look for the 12-month cooldown trap.

If the account type is premium or Data Lake, failover is not an option.

If the question asks for RPO, look for 15 minutes.

If the question asks for write access to secondary before failover, the answer is no for RA-GZRS (read-only).

Key Takeaways

Customer-managed failover is only supported for storage accounts with GRS, RA-GRS, GZRS, or RA-GZRS replication.

The RPO (Recovery Point Objective) for failover is up to 15 minutes.

The RTO (Recovery Time Objective) is typically under 1 hour.

There is a 12-month cooldown between customer-managed failovers.

After failover, the original primary becomes a secondary and is not accessible until a reverse failover (requires Microsoft Support).

Failover is not supported for premium block blobs, Data Lake Gen2, or NFS shares.

RA-GZRS provides read access to the secondary region before failover; GZRS does not.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

RA-GZRS

Allows read access to secondary region at all times.

RPO is up to 15 minutes.

Supports customer-managed failover.

Costs more than GZRS due to read-access.

Used when you need to read data from secondary during normal operations.

GZRS

No read access to secondary until failover occurs.

RPO is up to 15 minutes (same as RA-GZRS).

Supports customer-managed failover.

Lower cost than RA-GZRS.

Used when you only need failover capability without secondary reads.

Watch Out for These

Mistake

Customer-managed failover is automatic when Azure detects a region outage.

Correct

Failover is never automatic. You must manually initiate it via portal, CLI, or PowerShell. Azure does not automatically fail over storage accounts.

Mistake

After failover, you can immediately fail back to the original primary once it recovers.

Correct

You cannot initiate another failover for 12 months. To revert, you must contact Microsoft Support for a reverse failover, which is not guaranteed and may take time.

Mistake

RA-GZRS allows both read and write access to the secondary region during normal operations.

Correct

RA-GZRS only allows read access to the secondary. Writes are only accepted by the primary endpoint until a failover is performed.

Mistake

Failover is supported for all Azure Storage account types, including premium block blobs.

Correct

Failover is not supported for premium block blobs, Azure Data Lake Storage Gen2 (hierarchical namespace), or NFSv4.1 shares. Only standard general-purpose v2 accounts with GRS/RA-GRS/GZRS/RA-GZRS support failover.

Mistake

The data loss during failover is zero if you use RA-GZRS.

Correct

RA-GZRS has an RPO of up to 15 minutes. Data written to the primary within the last 15 minutes may not have been replicated to the secondary and will be lost during failover.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can I perform a storage account failover for a premium block blob account?

No. Customer-managed failover is not supported for premium block blob accounts. It is only supported for standard general-purpose v2 accounts with geo-redundant replication (GRS, RA-GRS, GZRS, RA-GZRS). If you need disaster recovery for premium blobs, consider using Azure Site Recovery or manual replication.

What is the maximum data loss during a storage account failover?

The maximum data loss is up to 15 minutes. This is the RPO (Recovery Point Objective) for geo-replication. Data written to the primary within the last 15 minutes may not have been replicated to the secondary and will be lost. You can minimize data loss by stopping writes to the primary before initiating failover.

How long do I have to wait before I can perform another failover?

You must wait 12 months between customer-managed failovers. This cooldown period is enforced by Azure. If you need to revert sooner, you must contact Microsoft Support to request a reverse failover, which is not guaranteed and may have additional costs.

Can I write to the secondary endpoint of an RA-GZRS storage account?

No. RA-GZRS allows read-only access to the secondary region. Writes are only accepted by the primary endpoint. To write to the secondary, you must perform a customer-managed failover to promote the secondary to primary.

Does storage account failover preserve soft delete and versioning settings?

Yes. Soft delete, versioning, change feed, and immutable storage policies are preserved after failover. However, note that the 12-month cooldown still applies. Also, any data that was soft-deleted before failover remains recoverable as long as the retention period has not expired.

What happens to the old primary region after failover?

The old primary region becomes a secondary region. However, it is not accessible for reads or writes until a reverse failover is performed. You cannot read from it. The storage account's secondary location is updated to the old primary region, but it is in a 'unavailable' state.

Can I use Azure Site Recovery instead of storage account failover for blob data?

Azure Site Recovery is designed for VM and application failover, not for storage account failover. For blob data, you should use customer-managed storage account failover. However, you can combine both: use Site Recovery for VMs and storage account failover for data in blobs.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Storage Account Failover and RA-GZRS — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.

Done with this chapter?