This chapter covers multi-region application architecture on AWS, a critical pattern for building highly available, fault-tolerant, and low-latency applications. For the DVA-C02 exam, approximately 10-15% of questions touch on multi-region concepts, particularly around disaster recovery, global services, and data replication. You will learn how to design and implement multi-region architectures using Route 53, DynamoDB Global Tables, S3 Cross-Region Replication, and RDS cross-region read replicas, as well as how to handle failover and traffic routing.
Jump to a section
Imagine a global theater company that puts on the same show simultaneously in multiple cities. Each city has its own stage, cast, and crew—this is a region. The show's script and set designs are stored in a central repository (a global service like Amazon S3 with cross-region replication or DynamoDB global tables). When an audience member in Tokyo buys a ticket, they are routed to the Tokyo stage (Route 53 latency-based routing). The Tokyo crew performs the show using local props and costumes (regional resources like EC2 instances and RDS databases). If the Tokyo stage catches fire, the router automatically redirects Tokyo ticket holders to the nearest available stage, say Seoul (Route 53 failover routing). The show must go on. Meanwhile, the central repository ensures that any changes to the script made by the New York crew are replicated to all stages within seconds (global tables). This way, every audience member sees the same show, with minimal latency, and the system survives the loss of any single stage.
What is Multi-Region Application Architecture?
Multi-region application architecture involves deploying application resources (compute, storage, databases) across two or more AWS Regions to achieve higher availability, fault tolerance, and lower latency for global users. On the DVA-C02 exam, you must understand the trade-offs between active-passive and active-active strategies, the use of global services (Route 53, CloudFront, IAM, etc.), and data replication mechanisms.
Why Multi-Region Architecture?
Disaster Recovery (DR): If an entire AWS Region becomes unavailable (e.g., due to natural disaster or power outage), traffic is routed to a secondary region.
Low Latency: Users access the nearest region, reducing network round-trip time.
Regulatory Compliance: Data residency requirements may mandate storing data in specific geographic locations.
Scalability: Distribute load across regions to handle global traffic spikes.
Key Components and How They Work
#### 1. Global Services vs. Regional Services
Global Services: Operate across all regions. Examples: Amazon Route 53, AWS CloudFront, AWS WAF, AWS Shield, AWS IAM, AWS Organizations. These services have a single global endpoint and are inherently resilient.
Regional Services: Operate within a single region. Examples: EC2, ELB, RDS, Lambda, DynamoDB (Standard table). To make them multi-region, you must deploy them in each region and configure replication or failover.
#### 2. Route 53 – Traffic Routing
Route 53 is the DNS service that routes users to the appropriate regional endpoint. DVA-C02 tests the following routing policies:
Latency-based: Routes to the region with the lowest latency for the user. Requires latency records (e.g., app.example.com with a latency record for each region).
Geolocation: Routes based on the user's geographic location. Useful for content restriction or regional compliance.
Geoproximity: Routes based on geographic distance, with optional bias to shift traffic.
Failover: Routes traffic to a primary resource; if health check fails, routes to secondary. Used in active-passive DR.
Weighted: Distributes traffic across multiple regions based on assigned weights. Used for A/B testing or gradual migration.
Health Checks: Route 53 can monitor the health of endpoints via HTTP/HTTPS/TCP health checks. If a primary region fails health check, failover routing automatically redirects traffic to the secondary region.
Default TTL: DNS record TTL is 300 seconds (5 minutes) by default. Lower TTL (e.g., 60 seconds) allows faster failover but increases DNS query costs.
#### 3. Data Replication Strategies
##### Amazon DynamoDB Global Tables
How it works: DynamoDB Global Tables replicate data across multiple regions with multi-master (active-active) support. Each region can read and write to its local replica. Conflicts are resolved using "last writer wins" (LWW) based on timestamp.
Replication latency: Typically under 1 second after the last update.
Exam tip: Global Tables require DynamoDB Streams enabled on the table. The stream captures changes and replicates them.
Limitations: Not all DynamoDB features are supported with Global Tables (e.g., TTL, global secondary indexes with projected attributes that differ across regions).
##### Amazon S3 Cross-Region Replication (CRR)
How it works: Automatically replicates objects from a source bucket in one region to a destination bucket in another region. Requires versioning enabled on both buckets.
Replication time: Most objects replicate within 15 minutes, but there is no SLA.
Use cases: Compliance, lower latency access, DR.
Exam tip: CRR does not replicate existing objects by default; you must use S3 Batch Operations or copy manually. Also, CRR does not replicate delete markers unless you configure it to.
Same-Region Replication (SRR): Replicates within the same region, useful for aggregating logs or separating test/prod.
##### Amazon RDS Cross-Region Read Replicas
How it works: An RDS instance in one region can have a read replica in another region. The replica is asynchronously updated.
Failover: For DR, promote the read replica to a standalone instance. This is manual; there is no automatic failover.
Exam tip: Cross-region read replicas are available for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server. Aurora also supports cross-region replication via Aurora Global Database.
##### Amazon Aurora Global Database
How it works: Consists of a primary region (writer) and up to 5 secondary regions (readers). Replication is dedicated, low-latency (typically <1 second).
Failover: In a disaster, promote a secondary region to primary in about 1 minute.
Exam tip: Aurora Global Database is designed for low-latency global reads and fast DR.
#### 4. Compute and Application Layer
##### AWS Lambda – Multi-Region
Lambda functions are regional. To run in multiple regions, deploy the same function code and configuration in each region. Use Route 53 to route API Gateway requests to the appropriate region. For stateful functions, use an external data store (e.g., DynamoDB Global Tables).
##### Amazon ECS/EKS – Multi-Region
Deploy containerized applications using the same task definitions and images across regions. Use a service mesh or Route 53 for traffic routing. For stateful services, use persistent storage like Amazon EFS (which is regional, but can be replicated via EFS replication) or RDS cross-region replicas.
Failover Strategies: Active-Passive vs. Active-Active
Active-Passive: One region handles all traffic (active), the other is on standby (passive). Failover occurs when active region fails. Data replication is one-way (active to passive). Simpler but wastes resources.
Active-Active: Both regions handle traffic simultaneously. Requires data replication both ways (multi-master) or a global data store. More complex but provides better utilization and lower latency for global users.
Disaster Recovery Tiers
Backup and Restore: Data backed up to another region; restore takes hours.
Pilot Light: Minimal resources running in DR region; scale up during failover.
Warm Standby: Scaled-down but functional copy of production in DR region.
Multi-Site Active-Active: Full production in both regions.
Configuration and Verification Commands
#### Route 53 Failover Routing (AWS CLI)
# Create a health check for the primary endpoint
aws route53 create-health-check --caller-reference "2023-01-01" --health-check-config Type=HTTP, FullyQualifiedDomainName=primary.example.com,Port=80,ResourcePath="/health"
# Create failover DNS records (primary and secondary)
aws route53 change-resource-record-sets --hosted-zone-id ZONEID --change-batch '{
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "primary",
"Failover": "PRIMARY",
"HealthCheckId": "CHECKID",
"TTL": 60,
"ResourceRecords": [{"Value": "1.2.3.4"}]
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "secondary",
"Failover": "SECONDARY",
"TTL": 60,
"ResourceRecords": [{"Value": "5.6.7.8"}]
}
}
]
}'#### Enable S3 Cross-Region Replication
# Enable versioning on source and destination buckets
aws s3api put-bucket-versioning --bucket source-bucket --versioning-configuration Status=Enabled
aws s3api put-bucket-versioning --bucket dest-bucket --versioning-configuration Status=Enabled
# Create replication role
aws iam create-role --role-name replication-role --assume-role-policy-document file://trust-policy.json
# Create replication configuration
aws s3api put-bucket-replication --bucket source-bucket --replication-configuration file://replication-config.jsonExample replication-config.json:
{
"Role": "arn:aws:iam::ACCOUNTID:role/replication-role",
"Rules": [
{
"Status": "Enabled",
"Priority": 1,
"DeleteMarkerReplication": { "Status": "Disabled" },
"Filter": { "Prefix": "" },
"Destination": {
"Bucket": "arn:aws:s3:::dest-bucket",
"StorageClass": "STANDARD"
}
}
]
}#### Verify DynamoDB Global Table status
aws dynamodb describe-global-table --global-table-name my-tableInteraction with Related Technologies
CloudFront + Lambda@Edge: CloudFront can route requests to regional origins based on geographic location. Lambda@Edge can run code at edge locations to customize content.
AWS Global Accelerator: Uses anycast IP to route traffic to the nearest healthy endpoint. Provides static IP addresses and improves performance.
AWS Transit Gateway: Can interconnect VPCs across regions using inter-region peering.
VPC Peering: Allows direct connectivity between VPCs in different regions (inter-region VPC peering).
Define Recovery Objectives
First, determine the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for the application. RTO is the maximum acceptable downtime after a disaster; RPO is the maximum acceptable data loss measured in time. For example, an e-commerce site might have an RTO of 15 minutes and RPO of 1 minute. These objectives drive the choice of replication strategy: synchronous replication (low RPO but high latency) vs. asynchronous (higher RPO but lower latency). For multi-region, asynchronous is typical due to geographic distance.
Choose Active-Passive or Active-Active
Based on RTO/RPO and cost, decide the failover strategy. Active-passive is simpler: one region handles traffic, the other is standby. Data flows one way (active to passive). Failover involves updating DNS records or using Route 53 failover routing. Active-active requires both regions to serve traffic, needing multi-master replication or a global data store. This is more complex but provides lower latency for global users and better resource utilization.
Deploy Regional Resources
Deploy compute, storage, and database resources in each region. For compute, use identical AMIs/containers/Lambda functions. For databases, set up replication: for RDS, create cross-region read replicas; for DynamoDB, enable Global Tables; for S3, configure CRR. Ensure security groups, IAM roles, and VPC configurations are consistent across regions. Use infrastructure as code (CloudFormation, Terraform) to maintain consistency.
Configure DNS and Traffic Routing
Use Route 53 to route traffic to the appropriate regional endpoints. For active-passive, create failover records: primary points to the active region, secondary to the standby. For active-active, use latency-based or geolocation routing. Set health checks on endpoints to detect failures. Configure TTL appropriately: lower TTL (e.g., 60 seconds) for fast failover, but higher TTL (e.g., 300 seconds) reduces DNS query costs.
Test Failover and Monitor
Regularly test failover by simulating a region failure. For active-passive, verify that Route 53 health check fails and traffic routes to the secondary region. For active-active, test that traffic is correctly distributed. Monitor replication lag using CloudWatch metrics (e.g., DynamoDB ReplicationLatency, S3 Replication metrics). Ensure application can handle eventual consistency if using asynchronous replication.
Scenario 1: Global SaaS Application A SaaS company provides a project management tool to customers worldwide. They deploy an active-active architecture across us-east-1 (N. Virginia) and eu-west-1 (Ireland). Compute runs on ECS Fargate behind Application Load Balancers. User session data is stored in DynamoDB Global Tables, so writes in either region are replicated within seconds. File attachments are stored in S3 with CRR. Route 53 latency-based routing directs users to the nearest region. During a major storm in the US East Coast, us-east-1 becomes degraded. Route 53 automatically routes all traffic to eu-west-1. Users experience slightly higher latency but no data loss because DynamoDB and S3 are replicated. The company meets its RTO of 5 minutes and RPO of 1 second.
Scenario 2: Financial Services Disaster Recovery A bank requires strict regulatory compliance: data must remain in the US, but they need DR. They choose an active-passive architecture with primary in us-east-1 and standby in us-west-2. They use RDS for MySQL with cross-region read replicas. The standby replica is promoted manually during a disaster. S3 CRR replicates transaction logs. Route 53 failover routing with health checks monitors the primary ALB endpoint. The bank conducts quarterly failover drills, promoting the read replica to a standalone instance and updating the application configuration. They observe an RTO of 30 minutes (manual promotion time) and RPO of 5 minutes (replication lag). Misconfiguration of security groups in the standby region once caused connectivity issues during a drill, highlighting the need for consistent infrastructure.
Scenario 3: Media Streaming Platform A video streaming service uses CloudFront as a CDN with multiple origins in different regions. The backend is a serverless application using API Gateway and Lambda, deployed in us-east-1, eu-west-1, and ap-southeast-1. User profiles are stored in DynamoDB Global Tables. Video metadata is stored in S3 with CRR. They use Route 53 geolocation routing to ensure users are directed to the region that contains their licensed content. During a regional outage, CloudFront automatically fails over to a healthy origin. The main challenge is managing replication of large video files; they use S3 Transfer Acceleration to speed up CRR. They monitor replication metrics and have alerts for lag.
DVA-C02 Objective 1.4: Design and implement multi-region application architecture.
What the exam tests: - Ability to choose appropriate global services (Route 53, CloudFront) and regional services. - Understanding of Route 53 routing policies: latency, geolocation, geoproximity, failover, weighted. - Knowledge of data replication: DynamoDB Global Tables, S3 CRR, RDS cross-region read replicas, Aurora Global Database. - Familiarity with DR strategies: backup & restore, pilot light, warm standby, multi-site. - Understanding of RTO and RPO and how they influence architecture. - Ability to identify the correct replication mechanism for different data stores.
Common wrong answers and why: 1. Choosing S3 Same-Region Replication (SRR) for DR: Candidates often confuse SRR with CRR. SRR replicates within the same region, so it does not protect against region failure. The exam expects CRR for DR. 2. Assuming DynamoDB Global Tables provide synchronous replication: Global Tables use asynchronous replication (eventual consistency). A common trap is to think writes are immediately visible in all regions. The exam tests that there is a replication lag (typically <1 second). 3. Selecting Route 53 simple routing for failover: Simple routing only maps a domain to one resource; it does not support health checks or automatic failover. The correct policy is failover routing. 4. Thinking RDS cross-region read replicas support automatic failover: They do not. You must manually promote the replica. The exam may present a scenario requiring automatic failover, where the correct answer is Aurora Global Database (which supports faster promotion) or a custom solution.
Specific numbers and terms: - Route 53 health check interval: 30 seconds (default) with 18 consecutive failures to mark unhealthy (9 minutes). - DynamoDB Global Tables: requires DynamoDB Streams, uses LWW conflict resolution. - S3 CRR: requires versioning, does not replicate existing objects or delete markers by default. - Aurora Global Database: up to 5 secondary regions, typical replication lag <1 second, failover promotion in ~1 minute. - RDS cross-region read replicas: available for MySQL, MariaDB, PostgreSQL, Oracle, SQL Server.
Edge cases: - If an S3 bucket has CRR enabled but the destination bucket is in a different account, you must configure a bucket policy that grants replication permissions. - DynamoDB Global Tables cannot be converted from an existing table; you must create a new global table. - Route 53 latency routing requires records for each region; if a region has no record, users near it may be routed elsewhere.
How to eliminate wrong answers: - If the question involves automatic failover for a database, look for Aurora Global Database or a custom health check + Lambda solution. RDS read replicas are manual. - If the question involves global read and write scalability, DynamoDB Global Tables is the answer. For read-only replicas, RDS cross-region read replicas or Aurora replicas are correct. - For static content DR, S3 CRR is the mechanism. If the question mentions existing objects, note that CRR does not replicate them by default.
Route 53 failover routing requires health checks; default health check interval is 30 seconds, and 18 consecutive failures mark unhealthy (9 minutes).
DynamoDB Global Tables require DynamoDB Streams enabled; conflict resolution is last writer wins based on timestamp.
S3 Cross-Region Replication requires versioning on both source and destination buckets; existing objects are not replicated automatically.
RDS cross-region read replicas do not support automatic failover; promotion is manual.
Aurora Global Database supports up to 5 secondary regions and can promote a secondary to primary in about 1 minute.
CloudFront can be used to route traffic to multiple regional origins for failover and low latency.
For disaster recovery, choose the strategy (backup & restore, pilot light, warm standby, multi-site) based on RTO and RPO.
Route 53 latency-based routing directs users to the region with the lowest latency based on latency measurements.
Geolocation routing routes based on the user's location; useful for content restriction or regional compliance.
Global services (Route 53, CloudFront, IAM) are inherently resilient across regions; regional services require explicit multi-region deployment.
These come up on the exam all the time. Here's how to tell them apart.
Active-Passive
One region handles all traffic; other is standby.
Simpler to implement and test.
Lower cost (fewer resources running full-time).
Failover time depends on DNS propagation and resource startup.
Data replication is one-way (active to passive).
Active-Active
Both regions handle traffic simultaneously.
More complex, requires multi-master replication or global data store.
Higher cost (both regions run full production).
Lower latency for global users due to regional access.
Data replication is bidirectional (multi-master).
DynamoDB Global Tables
Multi-master: all regions can read and write.
Replication latency typically <1 second.
Conflict resolution: last writer wins.
No automatic failover; all regions are active.
Supports eventual consistency only cross-region.
RDS Cross-Region Read Replicas
Single master: only primary region can write.
Asynchronous replication; lag depends on network.
No conflict resolution (read-only replicas).
Manual promotion for failover.
Can provide read scalability in multiple regions.
Mistake
Route 53 failover routing automatically fails back to the primary when it recovers.
Correct
Route 53 failover routing does not automatically fail back. When the primary recovers, it remains unhealthy until the health check passes again. But the DNS records are static; you must manually update the failover records to switch back, or use a script. The exam may test that failover is one-way without manual intervention.
Mistake
DynamoDB Global Tables provide strongly consistent reads across all regions.
Correct
Global Tables use eventual consistency for cross-region reads. Only the local region can provide strong consistency for reads. Writes are replicated asynchronously. A read in another region may return stale data until replication completes.
Mistake
S3 Cross-Region Replication replicates all objects, including existing ones, automatically.
Correct
CRR only replicates objects created after the replication configuration is enabled. To replicate existing objects, you must use S3 Batch Operations or copy them manually. The exam often tests this distinction.
Mistake
RDS cross-region read replicas can be promoted to a standalone instance automatically.
Correct
Promotion is a manual action. There is no built-in automatic failover. You must use a custom solution (e.g., Lambda with Route 53 health checks) to automate promotion and DNS update.
Mistake
Using CloudFront with multiple origins provides automatic multi-region failover without Route 53.
Correct
CloudFront can have multiple origins (e.g., primary and secondary) and can fail over based on origin health. However, CloudFront origins are regional endpoints. For DNS-level failover, you still need Route 53. The exam may test the combination of both.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Failover routing is designed for active-passive DR: you designate a primary and secondary resource. If the primary fails health check, Route 53 routes traffic to the secondary. Latency-based routing is for active-active: it directs each user to the region with the lowest latency, distributing traffic across all regions. For DR, failover routing is typically used because it provides clear primary/secondary behavior. However, latency-based routing can also be used with health checks to remove unhealthy regions.
Yes, S3 CRR supports cross-account replication. You need to grant the source bucket's IAM role permissions to write to the destination bucket, and the destination bucket must have a bucket policy that allows replication from the source account. Additionally, the destination bucket must have versioning enabled. The exam may test that cross-account replication requires additional configuration.
RDS does not natively support automatic cross-region failover. You can use a custom solution: create a Lambda function that monitors the primary RDS instance's health (via CloudWatch alarms), and on failure, promote the cross-region read replica to a standalone instance and update Route 53 DNS records. Alternatively, use Aurora Global Database, which supports promotion in about 1 minute but still requires manual or automated trigger.
DynamoDB Global Tables typically replicate changes within one second of the last update, but there is no SLA. The replication is asynchronous and eventual consistency. The exam may refer to this as 'sub-second' latency. You should also know that Global Tables use DynamoDB Streams to capture changes.
Yes, CloudFront supports origin failover. You can configure an origin group with a primary and secondary origin (e.g., an ALB in us-east-1 and another in eu-west-1). If the primary origin fails health checks, CloudFront automatically uses the secondary origin. This works well for static and dynamic content. However, for DNS-level failover, Route 53 is still needed for the origin domain names.
S3 Cross-Region Replication (CRR) replicates objects to a bucket in a different AWS Region, used for DR, compliance, or latency reduction. S3 Same-Region Replication (SRR) replicates objects to another bucket in the same region, used for log aggregation, test/prod separation, or data protection within the same region. The exam may ask which to use for disaster recovery; the answer is CRR.
Route 53 health checks cannot directly reach private endpoints (e.g., EC2 instances in a private subnet). You must use a health check that is publicly accessible, such as an Application Load Balancer with a public endpoint, or use CloudWatch alarms as a proxy. Alternatively, you can create a Route 53 health check that uses a specified endpoint (public IP or DNS name). For private resources, consider using a Lambda function that checks the resource and updates Route 53 failover records.
You've just covered Multi-Region Application Architecture — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?