This chapter covers Azure SQL Active Geo-Replication, a critical high-availability and disaster recovery feature for Azure SQL Database. For the DP-900 exam, understanding how active geo-replication works, its asynchronous nature, and its use cases is essential, as it appears in approximately 5-10% of questions related to data availability and business continuity. We will dissect the mechanics, configuration, and exam-specific nuances to ensure you can confidently answer any question on this topic.
Jump to a section
Imagine you own a critical data vault in London that must never go offline. You decide to build a second, identical vault in Tokyo that is constantly kept in sync. The London vault is the primary: it accepts all writes and reads. The Tokyo vault is a replica: it receives a continuous stream of transaction log backups from London, not the raw data files. Each transaction committed in London is shipped as a log record to Tokyo and applied there, so Tokyo is always a few seconds behind. If London suffers a catastrophic fire, you can declare Tokyo the new primary with a single command; the system automatically redirects all client connections to Tokyo. No data is lost because the log records are durable. You can also read from Tokyo at any time to offload read traffic, but writes go only to London. The key mechanism is that replication is asynchronous, meaning London does not wait for Tokyo to confirm receipt before committing — a trade-off that ensures low latency at the risk of minimal data loss (typically under 5 seconds). This is fundamentally different from a backup, which is a point-in-time copy that requires restore; active geo-replication keeps the replica ready for immediate use.
What is Active Geo-Replication and Why Does It Exist?
Active Geo-Replication is a feature of Azure SQL Database that allows you to create up to four readable secondary replicas of your database in different Azure regions. These replicas are continuously updated from the primary database via asynchronous replication. The primary purpose is to provide business continuity and disaster recovery (BCDR) by enabling you to fail over to a secondary region with minimal downtime. Unlike failover groups, which manage multiple databases and provide automatic failover, active geo-replication is configured at the individual database level and requires manual failover. The exam tests your understanding of its capabilities, limitations, and appropriate use cases.
How Active Geo-Replication Works Internally
Active geo-replication is built on SQL Server's Always On availability group technology but adapted for Azure SQL Database. When you configure a secondary replica in a different region, the primary database continuously streams transaction log records to the secondary. The mechanism is as follows:
Transaction Log Capture: Every transaction committed on the primary generates a log record. These records are written to the transaction log file (.ldf) on the primary.
Log Shipping: The primary database engine sends each log record to the secondary asynchronously. The secondary receives the log and writes it to its own transaction log.
Redo: The secondary applies the log records to its data files, bringing it up to date. This process is nearly continuous, with a typical lag of a few seconds (under 5 seconds in most cases, but can be higher under heavy load).
Readable Secondary: The secondary is online and readable. You can connect to it using a different connection string (e.g., Server=tcp:<secondary_server>.database.windows.net). However, the secondary does not accept write transactions.
Failover: In a disaster, you can initiate a failover to the secondary. This stops replication, applies any remaining log records, and makes the secondary the new primary. The old primary, if it comes back online, is automatically converted to a secondary (it will not accept writes).
Key Components, Values, Defaults, and Timers
Maximum Secondaries: Up to four readable secondaries per primary database. They can be in the same region or different regions, but for geo-replication, they are typically in different Azure regions.
Replication Mode: Asynchronous only. There is no synchronous option for active geo-replication. This means there is a potential for data loss (typically < 5 seconds) if the primary fails before log records are shipped.
RPO (Recovery Point Objective): Typically under 5 seconds, but not guaranteed. The actual RPO depends on network latency and workload. The exam may ask you to identify the RPO as "a few seconds" or "under 5 seconds."
RTO (Recovery Time Objective): The time to failover is typically under 1 minute once the failover command is issued, but network propagation and DNS cache TTL (300 seconds by default) can affect client connectivity. The exam may ask about RTO being "minutes" rather than seconds.
Connection Strings: After failover, clients must update their connection strings to point to the new primary. The server name changes. Use of a listener endpoint (via failover groups) avoids this.
Pricing: You pay for compute and storage for each secondary replica. The secondary must have the same or higher service tier as the primary (e.g., if primary is S3, secondary must be S3 or above).
Configuration and Verification Commands
You can configure active geo-replication using the Azure portal, PowerShell, Azure CLI, or REST API. Here are key examples:
Azure CLI – Create a secondary in a different region:
az sql db replica create --resource-group myRG --server primaryServer --name myDB --partner-server secondaryServer --partner-resource-group myRGPowerShell – Create a secondary:
New-AzSqlDatabaseSecondary -ResourceGroupName myRG -ServerName primaryServer -DatabaseName myDB -PartnerResourceGroupName myRG -PartnerServerName secondaryServer -AllowConnections AllVerify replication status – Check the ReplicationState property:
Get-AzSqlDatabaseReplicationLink -ResourceGroupName myRG -ServerName primaryServer -DatabaseName myDB -PartnerServerName secondaryServerThe ReplicationState can be SEEDING, CATCH_UP, SUSPENDED, or PENDING. CATCH_UP means the secondary is up to date.
Failover – Initiate a planned or forced failover:
Set-AzSqlDatabaseSecondary -ResourceGroupName myRG -ServerName secondaryServer -DatabaseName myDB -PartnerResourceGroupName myRG -FailoverHow Active Geo-Replication Interacts with Related Technologies
Failover Groups: Active geo-replication is the underlying technology for failover groups, which add a listener endpoint and support automatic failover. The exam distinguishes between the two: active geo-replication is manual per database; failover groups are automatic for a group of databases.
Backup and Restore: Active geo-replication is not a backup. It provides a live replica. Backups are still taken on the primary and are independent. The exam may ask you to compare geo-replication with geo-restore of backups.
Elastic Pools: You can replicate a database in an elastic pool to a secondary in another pool. The secondary pool must have sufficient resources.
SQL Managed Instance: Active geo-replication is also available for Azure SQL Managed Instance, but with some differences (e.g., maximum of one secondary). The exam primarily focuses on SQL Database.
Exam Trap Patterns
Trap 1: Confusing active geo-replication with auto-failover groups. Active geo-replication requires manual failover; failover groups can be automatic. The exam may describe a scenario and ask which feature to use.
Trap 2: Assuming synchronous replication. Active geo-replication is always asynchronous. There is a possibility of data loss. The exam may ask about RPO and the correct answer is "a few seconds" not "zero."
Trap 3: Thinking the secondary is not readable. It is readable, but not writable. The exam may ask if you can query the secondary for reporting.
Trap 4: Believing you can have more than four secondaries. The limit is four.
Trap 5: Confusing geo-replication with failover groups when the question involves multiple databases. Failover groups manage multiple databases; active geo-replication is per database.
Configure Secondary Replica
In the Azure portal, PowerShell, or CLI, you specify the primary database and the target region and server for the secondary. The system begins by seeding the secondary: copying the full database from the primary to the secondary. During seeding, the secondary is not readable. Seeding can take hours for large databases. Once seeding completes, the secondary enters a continuous synchronization mode where transaction log records are streamed asynchronously. The secondary is now readable and can be used for read-only queries.
Continuous Log Shipping
After seeding, every transaction committed on the primary is written to the transaction log. The log records are compressed and sent to the secondary over HTTPS. The secondary receives the log and writes it to its own log file, then redoes the changes to its data files. This process is asynchronous: the primary does not wait for the secondary to acknowledge receipt before committing. The typical lag is under 5 seconds, but can increase under heavy load or network issues. The `ReplicationState` property shows `CATCH_UP` when the secondary is current.
Read from Secondary
Applications can connect directly to the secondary server using its fully qualified domain name (e.g., `secondaryserver.database.windows.net`) and the database name. The secondary supports snapshot isolation for read consistency. You can offload reporting or analytics workloads to the secondary. However, the secondary does not accept write transactions. If you attempt to write, you receive an error. The exam may ask about using the secondary for read-scale.
Initiate Failover
When the primary fails or you need to perform maintenance, you initiate a failover to the secondary. This can be done via the portal, PowerShell, or CLI. The failover command stops replication, applies any remaining log records to the secondary, and transitions the secondary to become the new primary. The old primary, if it comes back online, is automatically converted to a secondary (it will not accept writes). The failover typically completes within seconds to a minute. After failover, client connection strings must be updated to point to the new primary (unless using failover groups).
Monitor and Manage
You can monitor replication lag using dynamic management views (DMVs) like `sys.dm_geo_replication_link_status` on SQL Database. The `replication_lag_sec` column shows the lag in seconds. You can also view the status in the Azure portal under the database's Geo-Replication blade. If replication is suspended due to a network issue, you can resume it manually. The exam may ask about monitoring tools and the meaning of replication statuses.
Enterprise Scenario 1: Global Application with Read Scale-Out
A multinational e-commerce company has its primary database in West Europe. To serve customers in Asia with low-latency reads, they create a secondary replica in Southeast Asia. The secondary is used for product catalog queries and order history lookups. The primary handles all write operations (new orders, user registrations). The company configures the application to route read-only traffic to the secondary using a geographic traffic manager. During a regional outage in West Europe, they manually fail over to the Southeast Asia secondary, which becomes the new primary. The application's write path is redirected to the new primary. The key consideration is that the secondary must have sufficient compute and storage to handle the read load; otherwise, lag can increase. The company monitors replication lag and scales up the secondary during peak seasons.
Enterprise Scenario 2: Disaster Recovery with Compliance Requirements
A financial services firm must comply with regulatory requirements for data residency and disaster recovery. They have their primary database in US East and must have a replica in US West for DR. They use active geo-replication to maintain a continuous replica. The RPO is defined as 5 seconds, and the RTO is 1 minute for failover. They conduct quarterly failover drills to ensure the process works. During a drill, they fail over to the secondary, run integrity checks, and then fail back. They also use geo-replication to support a hot standby for reporting without impacting the primary. The main challenge is managing the cost of the secondary, which must be at the same service tier as the primary. They optimize by using a lower service tier for the secondary if read load is low, but they must ensure it can handle the write load after failover.
Common Pitfalls in Production
Underprovisioning the secondary: If the secondary has lower DTUs/vCores than the primary, it may not keep up with the log apply rate, causing increasing lag. The secondary must be at least the same tier.
Ignoring network latency: Geo-replication over long distances can have higher lag. For critical DR, consider pairing regions with low latency (e.g., US East and US West).
Forgetting to update connection strings after failover: Without failover groups, clients must be updated to point to the new server. This can cause extended downtime.
Not monitoring replication health: Lag can increase unnoticed, leading to data loss if failover is triggered. Use Azure Monitor alerts on replication_lag_sec.
What DP-900 Tests on Active Geo-Replication
The DP-900 exam objective that covers this topic is "Describe high availability and disaster recovery options" (part of domain 2.2). Specifically, you need to understand the purpose and behavior of active geo-replication. The exam will ask about:
The asynchronous nature of replication
Readable secondaries
Manual failover
Maximum number of secondaries (four)
Typical RPO (a few seconds) and RTO (minutes)
Comparison with failover groups and backup restore
Common Wrong Answers and Why Candidates Choose Them
Wrong: "Active geo-replication provides synchronous replication with zero data loss." – Candidates confuse it with SQL Server Always On availability groups which can be synchronous. But Azure SQL active geo-replication is always asynchronous. The exam may include "zero data loss" as a distractor.
Wrong: "The secondary replica cannot be used for read queries." – Candidates assume it's like a passive standby. But the secondary is readable. The exam may ask "Can you run SELECT statements on the secondary?" Answer: Yes.
Wrong: "Failover is automatic." – This is true for failover groups, not for active geo-replication itself. The exam may present a scenario and ask which feature supports automatic failover.
Wrong: "You can have unlimited secondaries." – The limit is four. The exam may test this exact number.
Wrong: "Geo-replication replicates the entire database instantly." – It is asynchronous with lag; not instant.
Specific Numbers and Terms to Memorize
Maximum secondaries: 4
Replication mode: Asynchronous
RPO: Typically < 5 seconds
RTO: Minutes (usually < 1 minute for failover, plus DNS propagation)
Seeding: Initial copy required before continuous sync
Replication states: SEEDING, CATCH_UP, SUSPENDED, PENDING
Edge Cases the Exam Loves
What if the primary fails before log records are shipped? – Some data loss occurs (the last few seconds of transactions). The exam may ask about the risk of data loss.
Can you have multiple secondaries in the same region? – Yes, but it's not geo-replication if it's same region; it's just active replication. The exam may ask about geo-redundancy.
What happens to the old primary after failover? – It becomes a secondary automatically. It does not accept writes.
How to Eliminate Wrong Answers
If a question mentions "automatic failover" or "zero data loss," it is likely referring to failover groups or a different technology. Read the question carefully: if it says "active geo-replication" specifically, the answer must align with asynchronous, manual failover, and readable secondaries. Eliminate any answer that implies synchronous or automatic.
Active geo-replication is asynchronous; RPO is typically under 5 seconds.
You can have up to 4 readable secondaries per primary database.
Failover is manual; automatic failover requires failover groups.
Secondary replicas are readable and can offload read workloads.
The secondary must have the same or higher service tier as the primary.
After failover, the old primary becomes a secondary automatically.
Use DMV sys.dm_geo_replication_link_status to monitor lag.
Geo-replication is not a backup; it is a live replica.
These come up on the exam all the time. Here's how to tell them apart.
Active Geo-Replication
Per-database configuration
Manual failover only
No listener endpoint; clients must update connection strings
Supports up to 4 secondaries
Readable secondary
Failover Groups
Group of databases managed together
Supports automatic and manual failover
Provides a listener endpoint (read/write and read-only)
Uses active geo-replication under the hood
Readable secondary (same as geo-replication)
Mistake
Active geo-replication replicates the entire database instantly with zero data loss.
Correct
Replication is asynchronous, so there is a small lag (typically <5 seconds). If the primary fails, transactions not yet shipped to the secondary are lost.
Mistake
The secondary replica is a passive standby that cannot be queried.
Correct
The secondary is fully readable and can be used for read-only workloads like reporting.
Mistake
Failover is automatic when the primary fails.
Correct
Active geo-replication requires manual failover. Automatic failover is a feature of failover groups.
Mistake
You can have as many secondaries as you want.
Correct
The maximum is four secondaries per primary database.
Mistake
Active geo-replication is the same as geo-restore from backup.
Correct
Geo-replication maintains a continuous, up-to-date replica. Geo-restore creates a new database from a backup stored in a paired region, which may be hours old.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Active geo-replication is a per-database feature that allows you to create up to four readable secondaries in different regions. Failover groups are a higher-level abstraction that manages a group of databases and provides a listener endpoint for seamless failover. Failover groups support automatic failover, while active geo-replication requires manual failover. Both use the same underlying replication mechanism.
Yes, the secondary is fully readable. You can connect to it using its server name and database name. It is ideal for offloading read-only workloads such as reporting or analytics. However, it does not accept write transactions.
The RPO (Recovery Point Objective) is typically under 5 seconds, but can be higher under heavy load or network latency. The RTO (Recovery Time Objective) for failover is usually under 1 minute, plus DNS propagation time (up to 5 minutes). The exam expects you to know these approximate values.
You can have up to four secondaries per primary database. This includes secondaries in the same region or different regions. The limit is strict; you cannot exceed four.
Yes, because replication is asynchronous. If the primary fails before log records are shipped to the secondary, those transactions are lost. The amount of data loss is typically under 5 seconds, but it is not zero.
After a failover, the old primary becomes a secondary replica automatically. It will not accept write transactions. If you want to fail back, you can initiate another failover to make it the primary again.
Yes, active geo-replication is available for Azure SQL Managed Instance, but with some differences: you can have only one secondary, and the secondary must be in a different region. The exam primarily focuses on SQL Database.
You've just covered Azure SQL Active Geo-Replication — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?