RDS Multi-AZ and Read Replicas are two of the most commonly confused concepts on the SAA-C03 exam. Both create additional copies of your database, but they exist for completely different reasons and behave differently during failure and normal operation.
RDS Multi-AZ
Multi-AZ creates a standby replica in a different Availability Zone within the same region. The primary and standby instances stay synchronised using synchronous replication — every write to the primary is committed to the standby before being acknowledged to the application.
Key characteristics:
- Purpose: high availability and automatic failover, not performance
- Failover: automatic. If the primary instance fails, RDS automatically promotes the standby. The DNS endpoint is updated; applications reconnect without manual intervention. Failover typically takes 1–2 minutes.
- Standby not readable: the standby instance cannot serve read traffic. It only exists for failover.
- Maintenance: patches and minor version upgrades are applied to the standby first, then a failover is performed — minimising downtime.
Exam trap: candidates assume Multi-AZ improves read performance. It does not. The standby is never used for queries during normal operation.
"A company needs their RDS database to automatically recover if the primary instance fails." — Enable Multi-AZ.
"A company wants to offload reporting queries from the primary database." — Multi-AZ does not help. Use Read Replicas.
RDS Read Replicas
Read Replicas create asynchronous copies of the primary database. The primary sends write operations to the replica asynchronously — there is a small replication lag.
Key characteristics:
- Purpose: read scaling. Applications direct read-heavy queries to the replica.
- Readable: unlike the Multi-AZ standby, Read Replicas are fully accessible for read queries.
- Not automatic failover: Read Replicas do not automatically become the primary if the primary fails. You must manually promote a Read Replica.
- Cross-region: Read Replicas can be in different AWS regions — useful for disaster recovery and reducing latency for global users.
- Multiple replicas: up to 15 read replicas per RDS instance (MySQL, MariaDB, PostgreSQL).
Exam trap: candidates choose Read Replicas for high availability. Read Replicas do not provide automatic failover — they are for read performance. For automatic failover, use Multi-AZ.
Combining Multi-AZ and Read Replicas
You can use both simultaneously:
- Multi-AZ for automatic failover of the primary
- Read Replicas for read scaling
This is the common architecture for production databases that need both availability and performance.
Multi-AZ vs Read Replica — Quick Reference
| Feature | Multi-AZ | Read Replica |
|---|---|---|
| Purpose | High availability | Read performance |
| Replication | Synchronous | Asynchronous |
| Readable? | No (standby only) | Yes |
| Automatic failover? | Yes | No (manual promotion) |
| Cross-region? | No (same region) | Yes |
| Replication lag | None | Small lag |
Aurora Multi-AZ
Amazon Aurora behaves differently from RDS MySQL/PostgreSQL. Aurora uses a shared distributed storage layer across all instances, and all Aurora Replicas can serve read traffic. Aurora automatically fails over to an existing read replica, making failover faster than standard RDS Multi-AZ (typically 30 seconds or less).
For Aurora, the term "Multi-AZ" means having replicas in multiple AZs, and all replicas are readable — unlike standard RDS where the Multi-AZ standby is not readable.
Practice SAA-C03 RDS and database questions to build confidence with these distinctions.
Failover Trigger Conditions — What Actually Causes It
Multi-AZ failover is automatic, but it only triggers for specific conditions. Knowing what does and doesn't trigger failover prevents wrong answers on troubleshooting scenarios.
Conditions that DO trigger automatic Multi-AZ failover:
- AZ outage (the entire AZ becomes unavailable)
- Primary DB instance failure (hardware failure, OS crash)
- Primary DB instance storage failure
- Loss of network connectivity to the primary
- AWS-initiated OS patching on the primary (Multi-AZ allows patching without downtime — AWS patches the standby, promotes it, then patches the old primary)
Conditions that do NOT trigger automatic failover:
- High CPU utilization on the primary (the database is slow, not failed)
- Long-running queries blocking other queries
- Application connection timeouts (could be a network/SG issue, not DB failure)
- Read Replica replication lag
The exam scenario: "An RDS Multi-AZ instance is experiencing 95% CPU utilization and application response times are degraded. Failover has not occurred. Why?" Because high CPU does not trigger failover — the database is still running, just slowly. The fix is either a larger instance type, query optimization, or adding Read Replicas to offload read traffic. Failover doesn't help performance issues.
Multi-AZ Failover Time — The Real Number
AWS documentation states that Multi-AZ failover typically completes in 60–120 seconds. The process:
- RDS detects the primary failure
- RDS promotes the standby to primary (this is near-instant — the standby is already fully in sync)
- RDS updates the DNS CNAME for the endpoint to point to the standby's IP
- DNS propagation allows applications to reconnect
The DNS update is the bottleneck for application reconnection. The RDS endpoint DNS has a TTL of typically 5 seconds. But application connection pools often cache DNS results longer than the TTL. A Java application using an JDBC connection pool might hold connections to the old primary for minutes after the DNS update, until it decides to check for new connections.
The exam tests this application-side consideration: "How can an application minimize downtime during a Multi-AZ failover?" The answers involve connection pool configuration — specifically, setting a reasonable connection timeout, enabling connection validation (checking connection health before using it), and setting the pool's TTL lower than the DNS TTL. Applications that don't handle reconnection logic can experience more than the 60-120 second base failover time.
Read Replica Promotion to Primary — The Manual Step
Read Replicas are NOT a substitute for Multi-AZ for high availability. The critical difference: Multi-AZ failover is automatic. Read Replica promotion is a manual action.
If you have an RDS primary in us-east-1 and a Read Replica in us-west-2 for disaster recovery, and the primary fails, no automatic failover occurs. You must:
- Detect that the primary has failed (via CloudWatch alarms, monitoring)
- Decide to promote the Read Replica
- Execute the promotion (console or CLI: aws rds promote-read-replica --db-instance-identifier
) - Update your application's database endpoint to the promoted instance
- Update Route 53 or connection strings to point to the new primary
This takes minutes of manual intervention. During this time, write operations to the database fail. For an RTO of seconds, this is too slow. Multi-AZ is the answer for automatic failover within a region.
Read Replica promotion is the answer for cross-region DR where the RPO is minutes and the RTO is 5-15 minutes (manual but fast). The promotion itself takes 2-5 minutes; the promotion plus DNS failover takes longer.
Multi-AZ DB Cluster vs Multi-AZ DB Instance
AWS introduced Multi-AZ DB Cluster as a new deployment option, distinct from the original Multi-AZ DB Instance. The exam distinguishes them for scenarios that need both HA and read scaling.
Multi-AZ DB Instance (the original):
- 1 primary writer instance
- 1 standby replica in another AZ — not readable by applications
- Synchronous replication to standby
- Failover to standby in 60-120 seconds
- Use when you need HA but don't need read scaling
Multi-AZ DB Cluster (newer):
- 1 primary writer instance
- 2 readable standby instances in different AZs
- Synchronous replication to both standbys
- The standby instances serve reads, reducing load on the primary
- Faster failover (typically under 35 seconds)
- Use when you need HA AND want to offload read traffic without separate Read Replicas
The exam scenario: "A company needs high availability for their RDS MySQL database and wants to serve read traffic from the standby instances to reduce costs." Multi-AZ DB Cluster is the answer — the original Multi-AZ DB Instance has a non-readable standby, so reads would still all go to the primary.
Automated Backups and Multi-AZ
With Multi-AZ enabled, RDS takes automated backups from the standby instance rather than the primary. This is an important operational benefit: backup I/O doesn't impact primary instance performance.
Automated backup retention period is configurable from 1 to 35 days. Point-in-time recovery (PITR) allows you to restore to any point within the retention period with 1-second granularity.
The exam scenario: "A company is concerned that automated backups impact their production RDS database performance. How do they prevent this?" Enable Multi-AZ — backups will then be taken from the standby. The primary continues serving production traffic without backup I/O impact.
Automated backups are stored in S3 (you don't see the bucket — it's managed by RDS). They persist for the retention period even if you delete the RDS instance, for a brief window — though you should take a final snapshot before deletion if you want to preserve the data indefinitely.
The Synchronous vs Asynchronous Distinction
This is the most important technical distinction between Multi-AZ and Read Replicas, and the exam tests it to determine which feature fits a given scenario.
Multi-AZ: synchronous replication. When an application commits a write to the primary, RDS simultaneously writes to the standby and waits for acknowledgment from both before returning success to the application. This guarantees zero data loss on failover — the standby always has everything the primary had. The tradeoff is slightly higher write latency (the round-trip to the standby AZ adds a few milliseconds).
Read Replica: asynchronous replication. The primary commits the write and returns success to the application immediately. The changes are then replicated to the Read Replica(s) in the background. There is always some replication lag — typically milliseconds to seconds, but potentially more under heavy write load. You can monitor this with the ReplicaLag CloudWatch metric.
The implication: if you promote a Read Replica after a primary failure, you may lose the writes that were committed on the primary but not yet replicated to the replica. The amount of data loss is the current replication lag. This is why Read Replica promotion is a DR strategy (where some data loss is acceptable) rather than an HA strategy (where no data loss is acceptable).
Encryption and Multi-AZ
Encryption for an RDS Multi-AZ deployment uses a single KMS key for both the primary and standby instances. The replication traffic between primary and standby is encrypted in transit over AWS's internal network.
A common scenario on the exam: "A company has an unencrypted RDS instance they want to encrypt. What is the process?"
You cannot enable encryption on a running unencrypted instance in place. The process is:
- Take a snapshot of the unencrypted instance
- Copy the snapshot with encryption enabled (specify the KMS key during the copy)
- Restore the encrypted snapshot to a new instance
- Update the application connection string to the new instance
- Optionally enable Multi-AZ on the new encrypted instance
The key point is that there's no in-place encryption option. You must go through the snapshot-copy-restore flow. This process causes downtime unless you use read replica promotion as an alternative: create an encrypted Read Replica (encryption can differ from the primary), promote it, and cut over — but this requires more coordination.
Practice Question Sets
Working through real SAA-C03 questions is the fastest way to lock in how the exam phrases these scenarios. Pick a session that fits your time: