This chapter covers Azure Cosmos DB's global distribution and failover capabilities, which are critical for building globally distributed, highly available applications. For the DP-900 exam, approximately 15-20% of questions in the 'Work with relational data on Azure' domain touch on Cosmos DB global distribution, failover priorities, and consistency levels. You will need to understand how multi-region writes work, the difference between manual and automatic failover, and how consistency models affect replication. This chapter provides the depth needed to answer those questions confidently.
Jump to a section
Imagine an international company with a main post office in London and backup post offices in New York, Tokyo, and Sydney. The company has a single mailing address that customers use worldwide. When a letter arrives at the London post office, it is immediately replicated and sent to each backup post office, so all locations have identical mail. If the London post office suffers a power outage, a central routing service automatically updates the global address system to point the company's mailing address to the New York post office. Incoming letters are then delivered to New York instead of London. The New York post office already has all the mail because of the earlier replication, so no letters are lost. Once London is restored, the central routing service can switch back to London or keep New York as primary. This system ensures continuous mail delivery even during disasters, but it requires careful planning of how quickly mail is copied between post offices (replication latency) and how fast the address switch happens (failover time). In Azure Cosmos DB, this is exactly how global distribution works: one write region (primary) replicates data to multiple read regions (secondaries), and you can trigger a manual or automatic failover to change the write region if the primary becomes unavailable.
What is Global Distribution in Cosmos DB?
Azure Cosmos DB is a globally distributed, multi-model database service. Unlike traditional databases that run in a single region, Cosmos DB can replicate your data across any number of Azure regions worldwide. This is not a simple backup; it is a fully managed, turnkey global distribution system. You can add or remove regions at any time with a single Azure CLI command or a few clicks in the portal. The service automatically replicates data to all configured regions with latency guarantees of less than 10 milliseconds for reads and writes at the 99th percentile within the same region, and less than 15-20 milliseconds for cross-region writes (depending on distance).
Why Use Global Distribution?
The primary reasons are: - Low-latency access: Place data close to your users worldwide. For example, a gaming company with users in Europe, Asia, and North America can have a Cosmos DB account with regions in each continent so that reads and writes are served from the nearest region. - High availability (HA): If one region goes down, the database can still serve reads and writes from other regions. Cosmos DB offers a 99.999% read and write availability SLA when you configure your account for multi-region writes and automatic failover. - Disaster recovery (DR): With automatic failover, if the primary write region becomes unavailable, Cosmos DB automatically promotes the next highest-priority region as the new write region, typically within minutes (SLA: 1-hour for automatic failover, but usually much faster). - Global scale: You can elastically scale throughput (RU/s) and storage across all regions. Total throughput is the sum of provisioned RU/s across all regions, but you can also configure throughput per region.
How It Works Internally
Cosmos DB uses a multi-master replication protocol (if multi-region writes are enabled) or a single-master with multiple read replicas. The core mechanism is based on a set of replicas that use a consensus protocol (similar to Paxos but optimized for Cosmos DB). Here is the step-by-step internal flow:
Account Configuration: When you create a Cosmos DB account, you select one or more Azure regions. You designate one region as the write region (if not using multi-region writes). You can also set a failover priority list for automatic failover.
Replication: Every write operation is first committed in the write region. The write region then asynchronously replicates the data change to all other regions (read regions). The replication is near real-time, typically within a few hundred milliseconds, but can be slower under heavy load or long distances. Cosmos DB guarantees that all reads from a given region are eventually consistent with the write region, but the exact staleness depends on the consistency level chosen.
3. Consistency Levels: Cosmos DB offers five consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual. These affect replication behavior: - Strong: Reads are guaranteed to see the most recent write. This requires synchronous replication to all regions, which increases latency and reduces availability. Cannot be used with multi-region writes. - Bounded Staleness: Reads may lag behind writes by at most K versions or T time (e.g., 100,000 operations or 5 seconds). This is a good compromise for globally distributed apps. - Session: Guarantees monotonic reads, writes, and read-your-writes within a single client session. This is the default and works well for most applications. - Consistent Prefix: Guarantees that reads never see out-of-order writes. For example, if writes A, B, C occur in order, a read will never see A, C, B. - Eventual: No ordering guarantee. Reads may see stale data. Lowest latency and highest availability.
Failover: If the write region becomes unavailable (e.g., due to a regional outage), Cosmos DB can automatically fail over to the next region in the priority list. The failover process involves:
The service detects the outage via health probes.
It promotes the highest-priority available region to be the new write region.
DNS records are updated so that clients using the Cosmos DB endpoint (e.g., https://myaccount.documents.azure.com) are redirected to the new write region.
The failover typically completes within minutes, though the SLA is 1 hour for automatic failover.
Manual Failover: You can also trigger a manual failover at any time, for example, to test your DR plan or to move the write region closer to users. Manual failover requires that you have at least two regions and that you have not enabled automatic failover (or you must disable it first).
Key Components, Values, and Defaults
Regions: You can add up to 30 regions per Cosmos DB account (soft limit, can be increased by support).
Write Region: The region where all writes are accepted. If multi-region writes are enabled, all regions accept writes.
Failover Priority: An ordered list of regions (1 to N). Priority 1 is the highest. If the current write region fails, Cosmos DB promotes the region with the lowest priority number (i.e., highest priority) that is available.
Automatic Failover: Enabled by default when you have more than one region. You can disable it, but then you must manually fail over if the write region goes down.
Multi-Region Writes: When enabled, all regions can accept writes. This increases write availability but requires a different consistency model (cannot use Strong consistency). It also incurs higher RU/s costs because each write is replicated to all regions.
Consistency Default: Session consistency is the default for new accounts.
Replication Latency: Typically < 10 ms intra-region, < 15-20 ms cross-region for writes at the 99th percentile.
Failover Time: Automatic failover typically completes within 1-2 minutes, but the SLA is 1 hour. Manual failover is instantaneous (a few seconds).
Configuration and Verification Commands
You can manage global distribution using Azure CLI, PowerShell, or the portal.
Azure CLI examples:
Create a Cosmos DB account with multiple regions:
az cosmosdb create \
--name mycosmosaccount \
--resource-group myrg \
--locations regionName=westus failoverPriority=0 isZoneRedundant=False \
--locations regionName=eastus failoverPriority=1 isZoneRedundant=False \
--default-consistency-level SessionAdd a new region:
az cosmosdb update \
--name mycosmosaccount \
--resource-group myrg \
--locations regionName=westus failoverPriority=0 \
--locations regionName=eastus failoverPriority=1 \
--locations regionName=westeurope failoverPriority=2Trigger a manual failover:
az cosmosdb failover-priority-change \
--name mycosmosaccount \
--resource-group myrg \
--failover-policies regionName=westus failoverPriority=0 \
--failover-policies regionName=eastus failoverPriority=1Enable multi-region writes:
az cosmosdb update \
--name mycosmosaccount \
--resource-group myrg \
--enable-multiple-write-locations trueVerification:
az cosmosdb show --name mycosmosaccount --resource-group myrg --query "{writeLocations: writeLocations, readLocations: readLocations, enableMultipleWriteLocations: enableMultipleWriteLocations}"Interaction with Related Technologies
Azure Traffic Manager: You can place Cosmos DB behind Traffic Manager to route users to the nearest region based on geographic location or performance. However, Cosmos DB's native global distribution already provides a single endpoint that automatically routes to the nearest region for reads (if using SDK with preferred locations).
Azure Front Door: Similar to Traffic Manager but with more advanced routing rules and WAF capabilities.
Azure Functions: You can use Azure Functions to process change feed events from Cosmos DB in a globally distributed manner. The change feed is available in each region independently.
Azure Synapse Link: Enables near-real-time analytics on Cosmos DB data without impacting transactional workloads, and works across regions.
Enable Multi-Region Writes
In the Azure portal, navigate to your Cosmos DB account, select 'Replicate data globally' under 'Settings', and toggle 'Enable multi-region writes' to On. Alternatively, use the Azure CLI command `az cosmosdb update --enable-multiple-write-locations true`. This allows each region to accept writes independently. Internally, Cosmos DB uses a multi-master replication protocol where each region commits writes locally and then asynchronously replicates to all other regions. Conflict resolution policies (last-writer-wins or custom) handle concurrent writes to the same document in different regions. This configuration increases write availability to 99.999% but disables strong consistency.
Configure Failover Priorities
In the 'Replicate data globally' blade, you can reorder the regions by dragging them. The region at the top has priority 0 (highest), the next has priority 1, etc. This list determines the order of failover if automatic failover is enabled. For automatic failover, you must have at least one region with priority 0 (the current write region) and at least one other region. If you disable automatic failover, the priority list is ignored; you must manually trigger failover. The priority list can be updated at any time, but changes take effect immediately for future failovers. A common mistake is to set the same priority for two regions; this is not allowed.
Test Manual Failover
To test disaster recovery, you can perform a manual failover. First, ensure automatic failover is disabled (or you can temporarily disable it). Then, in the portal, click 'Manual Failover' and select the region you want to become the new write region. The service will switch the write region to that region. During the transition, there is a brief period (a few seconds) where writes may be rejected as the DNS records update. The SDK will automatically retry. After failover, verify that the new write region is active by checking the 'Write Region' in the account overview. Manual failover is instantaneous and does not cause data loss because all data is already replicated asynchronously.
Configure SDK Preferred Locations
To optimize read latency, you can set the preferred locations in the Cosmos DB SDK. For example, in .NET: `new ConnectionPolicy { PreferredLocations = { "West US", "East US" } };`. The SDK will route read requests to the first available region in the list. If that region is down, it falls back to the next. Write requests are always sent to the current write region (unless multi-region writes are enabled, in which case they go to the nearest region). This setting does not affect failover; it only controls client-side routing. A common exam trap is that preferred locations can be used to force writes to a specific region even without multi-region writes — this is false; writes always go to the write region.
Monitor Replication Latency
You can monitor replication lag using Azure Monitor metrics. Key metrics include 'Replication Latency' (the time difference between the write region and each read region) and 'Normalized RU Consumption' across regions. High replication latency may indicate network congestion or high write throughput. You can also use the Cosmos DB diagnostics logs to track replication events. The SLA for replication is not explicitly defined, but the service guarantees that data is replicated to all regions within the configured consistency bounds. For example, if using Bounded Staleness with a 5-second window, replication must complete within 5 seconds.
Enterprise Scenario 1: Global E-Commerce Platform
A multinational e-commerce company with customers in North America, Europe, and Asia uses Cosmos DB to store product catalogs, user profiles, and shopping cart data. They deploy a single Cosmos DB account with three write regions (multi-region writes enabled) in West US, West Europe, and Southeast Asia. This allows users in each region to write to their nearest region with low latency. The company uses Session consistency because it ensures users see their own writes immediately. They configure automatic failover with a priority list: West US (priority 0), West Europe (priority 1), Southeast Asia (priority 2). During a regional outage in West US, the service automatically promotes West Europe to write region. The SDKs, configured with preferred locations, seamlessly redirect traffic. The company monitors replication latency using Azure Monitor and sets alerts if latency exceeds 20 ms. They also use the change feed in each region to trigger Azure Functions for real-time inventory updates. A common misconfiguration is setting the consistency level to Strong, which would cause high write latency and potential write unavailability during a regional failure; they correctly use Session.
Enterprise Scenario 2: IoT Telemetry Ingestion
An IoT company ingests telemetry from millions of devices worldwide into Cosmos DB. They use a single write region in East US (to centralize data) and multiple read regions in West Europe, East Asia, and Brazil for low-latency analytics. They do not enable multi-region writes because all device data must be sequentially ordered for accurate time-series analysis. They use Bounded Staleness consistency with a 10-second staleness window to balance consistency and performance. Automatic failover is enabled with failover priority: East US (priority 0), West Europe (priority 1), East Asia (priority 2). During a planned maintenance in East US, they perform a manual failover to West Europe. They use the Azure CLI to script the failover and verify the new write region. A pitfall they encountered: they initially set the consistency to Strong, which caused write throttling because Strong requires synchronous replication to all regions; they changed to Bounded Staleness. They also learned that during a failover, the change feed in the old write region stops emitting new changes until the region is restored; they had to handle this in their downstream processing.
Scenario 3: Mobile Gaming Leaderboard
A mobile gaming company uses Cosmos DB for real-time leaderboards with global players. They need low write latency for score updates and high read throughput for leaderboard queries. They enable multi-region writes with three regions: East US, West Europe, and Japan West. They use Eventual consistency because leaderboard accuracy can tolerate a few seconds of staleness. They configure automatic failover but set the priority list to match the order of player concentration. During a regional outage in East US, the failover happens automatically. The SDK automatically retries writes. They also use the multi-region writes to distribute write load; each region handles writes for its local players. A common mistake they avoided: they did not use Strong consistency because it would not be possible with multi-region writes. They also ensured that their application code handles conflict resolution using last-writer-wins with timestamps.
DP-900 Exam Focus: Cosmos DB Global Distribution and Failover
Objective Code: 2.4 – Describe key components of Azure Cosmos DB (including global distribution, consistency levels, and failover). This objective is tested in approximately 5-8 questions on the exam.
What the Exam Tests: - Understanding that Cosmos DB can replicate data to multiple Azure regions with a single click or command. - Knowing the difference between multi-region writes (all regions accept writes) and single-write region (only one region accepts writes). - Identifying that automatic failover requires a priority list and that manual failover is available when automatic failover is disabled. - Recognizing that consistency levels affect replication behavior: Strong requires synchronous replication and cannot be used with multi-region writes. - Knowing the default consistency level is Session. - Understanding that failover can be triggered manually or automatically, and that automatic failover has an SLA of 1 hour (but typically completes in minutes).
Common Wrong Answers and Why Candidates Choose Them: 1. "You can only have one write region." – Candidates think of traditional databases. Cosmos DB supports multi-region writes (multi-master). The exam will test that this is an option. 2. "Automatic failover happens instantly." – Candidates assume it's like a DNS change. In reality, it takes up to 1 hour (SLA), though usually faster. The exam may ask about the SLA. 3. "Strong consistency is required for global distribution." – Candidates confuse consistency with replication. Strong consistency is actually the least compatible with global distribution because it requires synchronous replication, which increases latency. 4. "You can set any consistency level with multi-region writes." – Strong consistency is not allowed with multi-region writes. The exam will test this restriction.
Specific Numbers and Terms to Memorize: - Maximum regions per account: 30 (soft limit). - Default consistency: Session. - Automatic failover SLA: 1 hour. - Multi-region writes: can be enabled/disabled. - Failover priority: 0 = highest priority. - Consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix, Eventual.
Edge Cases and Exceptions: - If you have only one region, failover is not applicable. - If automatic failover is disabled and the write region goes down, your application will experience write failures until you manually fail over. - You cannot change consistency level from Strong to another level if multi-region writes are enabled; you must first disable multi-region writes. - The write region is not necessarily the region with priority 0; priority 0 is the region that should become the write region after failover. Initially, the write region is the first region you added.
How to Eliminate Wrong Answers: - If a question mentions "lowest latency writes globally," look for multi-region writes or preferred locations. - If a question mentions "strong consistency," remember that it cannot be used with multi-region writes and may not be suitable for global distribution. - If a question asks about failover time, the SLA is 1 hour; do not choose "instantaneous" unless it's manual failover. - For questions about replication, remember that replication is asynchronous (except for Strong consistency).
Cosmos DB can replicate data to up to 30 regions globally.
Multi-region writes allow every region to accept writes, but Strong consistency is not supported.
Automatic failover uses a priority list and has an SLA of 1 hour.
Manual failover is instantaneous and requires automatic failover to be disabled.
The default consistency level is Session.
Replication is asynchronous except for Strong consistency, which is synchronous.
Preferred locations in the SDK control client-side read routing, not writes.
These come up on the exam all the time. Here's how to tell them apart.
Single-Write Region
Only one region accepts writes.
Simpler conflict resolution (no conflicts).
Supports Strong consistency.
Lower RU/s cost per write (no cross-region replication overhead).
Write latency is higher for users far from the write region.
Multi-Region Writes
All regions accept writes.
Requires conflict resolution (last-writer-wins or custom).
Cannot use Strong consistency.
Higher RU/s cost because each write is replicated to all regions.
Write latency is low for all users (write to nearest region).
Mistake
Cosmos DB global distribution requires multi-region writes.
Correct
Multi-region writes are optional. You can have a single write region with multiple read regions. Multi-region writes increase write availability but are not required for global distribution.
Mistake
Automatic failover is instantaneous.
Correct
Automatic failover has an SLA of 1 hour, though it typically completes within minutes. Manual failover is near-instantaneous.
Mistake
You can use Strong consistency with multi-region writes.
Correct
Strong consistency is not supported when multi-region writes are enabled. Strong requires synchronous replication to all regions, which conflicts with the multi-master protocol.
Mistake
The failover priority list determines the current write region.
Correct
The priority list determines the order of failover, not the current write region. The current write region is the one you initially set or the one that became write region after a failover.
Mistake
You can have only one read region per Cosmos DB account.
Correct
You can have up to 30 regions per account, and all non-write regions are read regions (unless multi-region writes are enabled, in which case all regions are both read and write).
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Yes, you can change the write region by performing a manual failover or by removing and re-adding regions. If you have automatic failover enabled, you can trigger a manual failover to switch the write region to any other region. Alternatively, you can disable automatic failover, then manually fail over. You can also reorder the failover priority list, but that does not change the current write region until a failover occurs.
Automatic failover is triggered by Cosmos DB when the current write region becomes unavailable. It uses the failover priority list to promote the next available region. Manual failover is initiated by you (via portal, CLI, or PowerShell) to switch the write region at any time, for example, during planned maintenance. Manual failover is instantaneous, while automatic failover can take up to 1 hour (SLA).
Yes, Cosmos DB supports multi-region writes, also known as multi-master. When enabled, all regions can accept writes. This increases write availability but requires conflict resolution and disables Strong consistency. It is ideal for globally distributed applications that need low write latency everywhere.
Session consistency is the default and works well for most applications. It provides read-your-writes, monotonic reads, and monotonic writes within a session. For applications that can tolerate some staleness, Eventual or Consistent Prefix offer lower latency. Bounded Staleness is a good compromise for global distribution if you need bounded staleness. Strong consistency is not recommended for global distribution because it requires synchronous replication and increases latency.
During a failover, there is a brief period (a few seconds to minutes) where writes may be rejected as the new write region is promoted and DNS records update. The Cosmos DB SDK automatically retries operations with exponential backoff. Reads are unaffected if you have configured preferred locations. After failover, your application will connect to the new write region. It is important to test failover scenarios to ensure your application handles the transition gracefully.
No, Strong consistency is not supported when multi-region writes are enabled. Strong consistency requires synchronous replication to all regions, which is incompatible with the multi-master protocol used for multi-region writes. If you need Strong consistency, you must use a single write region.
Data is not lost during a failover because all data has already been replicated asynchronously to all regions (within the configured consistency bounds). However, any writes that were in flight but not yet replicated to the new write region at the time of failover may be lost if the old write region goes down permanently. Cosmos DB guarantees that the data that was committed in the old write region is replicated, but there is a small window of potential data loss (usually a few seconds) for asynchronous replication.
You've just covered Cosmos DB Global Distribution and Failover — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?