SAA-C03Chapter 19 of 189Objective 2.3

RDS and Aurora: Multi-AZ and Read Replicas

This chapter covers Amazon RDS Multi-AZ deployments and Read Replicas, two critical features for building resilient and scalable database architectures on AWS. For the SAA-C03 exam, questions on these topics appear in roughly 10-15% of the exam, often in the context of disaster recovery, high availability, and read scaling. You will learn the exact mechanisms, failover behaviors, replication modes, and configuration options that the exam tests, including specific timers and default values.

25 min read
Intermediate
Updated May 31, 2026

The Synchronous Replication Lifeguard

Imagine a busy swimming pool with two lifeguard towers: the primary and the standby. The primary lifeguard is actively watching the pool. Every time a swimmer enters the water, the primary lifeguard immediately radios the standby lifeguard with the exact details: who entered, where, and at what time. The standby lifeguard acknowledges the radio call before the primary lifeguard blows the whistle to confirm the swimmer's entry. If the primary lifeguard ever collapses (fails), the standby lifeguard is already fully updated on every single swimmer's location and can instantly take over watching the pool without missing a beat. The swimmers (clients) don't even notice the change—they just see a different lifeguard in the tower. This synchronous communication ensures zero data loss during failover. In contrast, for a crowded pool where lifeguards are exhausted, a separate system (Read Replicas) works like a lifeguard trainee who periodically jots down the primary's observations every few minutes. The trainee can answer questions about the pool, but if the primary collapses, the trainee is not immediately ready—some observations are missing. That is asynchronous replication.

How It Actually Works

What is Multi-AZ and Why It Exists

Amazon RDS Multi-AZ is a high-availability feature that automatically creates and manages a synchronous standby replica in a different Availability Zone (AZ). Its primary purpose is to provide automatic failover with zero data loss (RPO=0) and a typical RTO of 1-2 minutes. This is not a read scaling solution—the standby is not accessible for reads or writes. The exam tests this distinction heavily.

How Multi-AZ Works Internally

When you enable Multi-AZ on an RDS instance (supported engines: MySQL, MariaDB, PostgreSQL, Oracle, SQL Server), RDS provisions a primary instance in one AZ and a standby instance in another AZ. The replication is synchronous: every write transaction must be committed on both the primary and standby before the primary acknowledges the write to the client. This ensures that the standby is always fully consistent.

At the network level, RDS uses Amazon’s private network for replication traffic, which incurs no data transfer costs between AZs (but inter-AZ data transfer costs apply for data out). The replication is handled by the database engine’s native synchronous replication mechanism:

For MySQL and MariaDB: synchronous replication using the MySQL semisynchronous replication plugin (rpl_semi_sync_master enabled).

For PostgreSQL: synchronous replication using PostgreSQL’s built-in synchronous replication (synchronous_commit = remote_write or on).

For Oracle: Data Guard in Maximum Protection mode.

For SQL Server: SQL Server Mirroring or Always On Availability Groups.

During a failover event, RDS automatically detects the primary failure (health checks, network connectivity) and upgrades the standby to become the new primary. The DNS record for the RDS endpoint is updated to point to the new primary. Applications using the CNAME endpoint experience a brief downtime (typically 60-120 seconds) while the DNS change propagates and the new primary accepts connections.

Key Components, Values, Defaults, and Timers

Failover trigger: RDS uses health checks every 1 second. After 2 consecutive failed health checks, failover is initiated.

Failover time: Typically 60-120 seconds. Factors include DNS TTL (default 30 seconds, but clients may cache longer), database crash recovery, and transaction replay.

RPO: 0 (zero data loss) because replication is synchronous.

RTO: ~1-2 minutes.

Backup: Automated backups are taken from the standby to avoid I/O suspension on the primary.

Maintenance: Patching and maintenance are applied first to the standby, then failover occurs, then the old primary (now standby) is patched. This minimizes downtime.

Cost: You pay for both the primary and standby instances (double the compute and storage).

Read Replicas: What They Are and Why They Exist

Read Replicas are separate RDS instances that receive asynchronous replication from a source RDS instance. They are designed for read scaling, not high availability. You can have up to 15 Read Replicas per source (for MySQL, MariaDB, PostgreSQL, Oracle, SQL Server). Replicas can be in the same region, cross-region, or even cross-account.

How Read Replicas Work Internally

Replication is asynchronous: the source sends changes to the replica via the database engine’s native replication (e.g., MySQL’s binary log replication, PostgreSQL’s streaming replication). The source does not wait for the replica to acknowledge; therefore, the replica may lag behind. Replication lag is measured in seconds and can be monitored via CloudWatch metrics (ReplicaLag). The exam often tests that Read Replicas can be promoted to standalone instances (breaking replication) but this is not automatic and involves data loss if not fully caught up.

Key Components, Values, Defaults, and Timers

Maximum replicas: 15 per source (except SQL Server: 5).

Replication lag: As low as sub-second in optimal conditions, but can be minutes if the source is heavily loaded or network issues exist.

Promotion: You can manually promote a Read Replica to a standalone instance. This stops replication permanently.

Cross-region replicas: Supported for MySQL, MariaDB, PostgreSQL, Oracle, SQL Server. Requires the source to have automated backups enabled.

Cross-account replicas: Supported via AWS DMS or by sharing snapshots, but not natively for all engines.

Monitoring: Use SHOW SLAVE STATUS (MySQL) or pg_stat_replication (PostgreSQL) to check replication health.

Interaction with Related Technologies

RDS Proxy: Can be used with Multi-AZ to pool connections and reduce failover times by preserving connections during failover.

ElastiCache: Often used alongside Read Replicas to cache frequent reads and reduce load on the primary.

DMS: AWS Database Migration Service can be used to set up cross-account replicas or migrate data.

CloudWatch: Key metrics: DatabaseConnections, ReadLatency, WriteLatency, ReplicaLag, FreeStorageSpace.

Configuration and Verification Commands

To create a Multi-AZ DB instance via CLI:

aws rds create-db-instance \
    --db-instance-identifier mydb-multi-az \
    --db-instance-class db.r5.large \
    --engine mysql \
    --master-username admin \
    --master-user-password password \
    --multi-az \
    --allocated-storage 100

To create a Read Replica:

aws rds create-db-instance-read-replica \
    --db-instance-identifier mydb-replica \
    --source-db-instance-identifier mydb \
    --db-instance-class db.r5.large

To verify Multi-AZ status:

aws rds describe-db-instances --db-instance-identifier mydb-multi-az --query 'DBInstances[0].MultiAZ'

To check replication lag (MySQL):

SHOW SLAVE STATUS\G

Look at Seconds_Behind_Master.

Walk-Through

1

Enable Multi-AZ on RDS

When creating or modifying an RDS instance, set the `Multi-AZ` parameter to `true`. RDS automatically provisions a standby instance in a different AZ. The engine-specific synchronous replication is configured. For MySQL, the `rpl_semi_sync_master_enabled` parameter is set to ON. The standby is not accessible for reads or writes; it only receives replication traffic. The primary's endpoint remains unchanged.

2

Synchronous replication in progress

Every write transaction on the primary is sent to the standby via the native replication protocol. The primary waits for acknowledgment from the standby before committing the transaction. This ensures zero data loss. The replication traffic uses Amazon's internal network, incurring no data transfer costs for inter-AZ traffic. The standby applies the changes in real-time.

3

Primary failure detection

RDS continuously monitors the primary instance via health checks every 1 second. If two consecutive health checks fail, RDS initiates failover. The detection time is typically 2-3 seconds. During this period, the database becomes unavailable. RDS also monitors network connectivity and instance status.

4

Automatic failover to standby

RDS promotes the standby to become the new primary. It updates the DNS CNAME record for the DB instance endpoint to point to the new primary. The old primary is terminated or replaced. The failover process includes crash recovery on the new primary if needed. The typical failover time is 60-120 seconds.

5

Application reconnection

Applications using the RDS endpoint will experience a brief outage. After the DNS record is updated, clients must resolve the new IP address. For faster recovery, use RDS Proxy to maintain connection pools. The exam tests that you should use the CNAME endpoint, not the IP address, to ensure automatic reconnection.

What This Looks Like on the Job

Enterprise E-Commerce Platform

A large e-commerce company runs its transactional database on RDS MySQL with Multi-AZ enabled. The database handles thousands of writes per second during peak hours. The Multi-AZ setup ensures that if the primary fails, the standby takes over with zero data loss, preventing lost orders. The RTO of ~90 seconds is acceptable for their business. They also use three Read Replicas to offload read traffic for product catalog queries. The replicas are in the same region but different AZs. They monitor ReplicaLag and have alerts if lag exceeds 5 seconds. During flash sales, they temporarily add more replicas (up to 15) to handle the read spike. A common misconfiguration is setting the Read Replica's instance class too small, causing replication lag to grow under heavy write load on the source. They learned to ensure the replica has at least as much compute and storage as the source.

Global SaaS Application

A SaaS provider serves customers worldwide. They use RDS PostgreSQL with cross-region Read Replicas to provide low-latency reads in different AWS regions. The primary is in us-east-1, with replicas in eu-west-1 and ap-southeast-1. Applications in those regions connect to the local replica for read operations, reducing latency from 200ms to 10ms. They also use Multi-AZ for the primary in us-east-1 to ensure high availability. A challenge they faced was cross-region data transfer costs: each GB of replication data from us-east-1 to eu-west-1 costs $0.02. They optimized by filtering replication to only essential tables using PostgreSQL's publication/subscription feature. They also set up automated failover using AWS DMS for cross-region disaster recovery, since Read Replicas cannot be automatically promoted across regions.

Financial Services with Strict Compliance

A bank uses RDS Oracle with Multi-AZ to meet regulatory requirements for zero data loss. They also configure automated backups with a retention period of 35 days. Their compliance team requires that all database changes be auditable. They enable detailed monitoring and publish logs to CloudWatch Logs. They test failover quarterly by performing a manual failover using the AWS Console or CLI (aws rds failover-db-instance). During one test, they discovered that their application's connection pool was caching the old IP address, causing prolonged downtime. They fixed it by configuring the application to use the RDS endpoint with a short DNS TTL and by using RDS Proxy. This scenario underscores the exam's emphasis on using the CNAME endpoint and not hardcoding IPs.

How SAA-C03 Actually Tests This

Exactly What SAA-C03 Tests

Objective 2.3: Design a resilient database architecture. The exam expects you to know when to use Multi-AZ vs. Read Replicas. Key differentiators: Multi-AZ for high availability (automatic failover, zero data loss), Read Replicas for read scaling and cross-region disaster recovery (manual promotion, potential data loss).

Common scenario: An application needs to survive an AZ failure with minimal downtime. Answer: Multi-AZ. If the question mentions 'read-heavy workload' or 'offload read traffic', the answer is Read Replicas.

Cross-region DR: Use cross-region Read Replicas (for supported engines) or snapshot copy + restore. The exam may ask about RPO/RTO trade-offs.

Common Wrong Answers

1.

Using Multi-AZ for read scaling: Candidates see 'replica' and think it can serve reads. Wrong: Multi-AZ standby is not accessible.

2.

Promoting a Read Replica automatically during failure: Read Replicas require manual promotion. Automatic failover is only for Multi-AZ.

3.

Assuming all engines support cross-region replicas: SQL Server does not support cross-region Read Replicas natively. Use log shipping or DMS.

4.

Confusing Multi-AZ with Aurora Replicas: Aurora replicas can serve reads and have automatic failover (typically <30 seconds). RDS Multi-AZ standby cannot serve reads.

Specific Numbers and Terms

Maximum Read Replicas: 15 (MySQL, MariaDB, PostgreSQL, Oracle), 5 (SQL Server).

Failover time: 60-120 seconds.

RPO for Multi-AZ: 0.

Replication type: Synchronous for Multi-AZ, asynchronous for Read Replicas.

Backup source: Multi-AZ backups are taken from the standby.

DNS TTL: Default 30 seconds, but clients may cache longer. Use low TTL for faster failover.

Edge Cases and Exceptions

Multi-AZ for SQL Server: Uses SQL Server Mirroring or Always On. Not all editions support Multi-AZ (requires Enterprise Edition for Mirroring).

Oracle Multi-AZ: Uses Data Guard in Maximum Protection mode. Requires Oracle Enterprise Edition.

Read Replicas in the same region: Can be in different AZs or same AZ. The exam may ask about cross-AZ vs same-AZ cost implications.

Encryption: If the source is encrypted, all replicas must be encrypted. Cross-region replicas require the source to have encryption enabled.

How to Eliminate Wrong Answers

If the question asks about 'automatic failover' and 'zero data loss', eliminate any answer that mentions 'asynchronous' or 'manual promotion'.

If the question asks about 'offloading read traffic', eliminate any answer that mentions 'standby' or 'Multi-AZ'.

If the question asks about 'cross-region disaster recovery', look for 'Read Replica' or 'cross-region snapshot copy'. Eliminate 'Multi-AZ' because it is only within a region.

If the question mentions 'RTO of 1-2 minutes', it's likely Multi-AZ. If it mentions 'RPO of 5 minutes', it's likely a Read Replica or automated backup restore.

Key Takeaways

Multi-AZ provides synchronous replication with zero data loss (RPO=0) and automatic failover (RTO ~1-2 min).

Read Replicas use asynchronous replication; they are for read scaling, not high availability.

You can have up to 15 Read Replicas per source (5 for SQL Server).

Cross-region Read Replicas are supported for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server (via DMS).

Multi-AZ backups are taken from the standby to avoid I/O impact on the primary.

Always use the RDS CNAME endpoint, not the IP address, to allow automatic failover reconnection.

Read Replicas can be promoted to standalone instances, but this is a manual action and breaks replication.

For Aurora, use Aurora Replicas (up to 15) which serve reads and provide automatic failover with RTO <30 seconds.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Multi-AZ

Synchronous replication (RPO=0)

Automatic failover (RTO ~1-2 min)

Standby not accessible for reads

Only within the same region

Doubles compute and storage cost

Read Replicas

Asynchronous replication (possible data loss)

Manual promotion required for failover

Accessible for read queries

Can be cross-region and cross-account

Additional cost for replica instances

Watch Out for These

Mistake

Multi-AZ standby can be used for read queries.

Correct

The Multi-AZ standby is not accessible for reads or writes. It exists solely for failover. For read scaling, use Read Replicas.

Mistake

Read Replicas provide automatic failover.

Correct

Read Replicas require manual promotion to become the primary. There is no automatic failover. For automatic failover, use Multi-AZ or Aurora.

Mistake

Multi-AZ and Read Replicas are mutually exclusive.

Correct

You can have both: a Multi-AZ primary with Read Replicas. The replicas replicate from the primary (or from the standby if using Aurora).

Mistake

Cross-region Read Replicas are supported for all RDS engines.

Correct

Cross-region replicas are supported for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server (via DMS). But not all engines support it natively; for example, SQL Server cross-region requires DMS or log shipping.

Mistake

Failover is instantaneous with Multi-AZ.

Correct

Failover typically takes 60-120 seconds due to DNS propagation and crash recovery. It is not instant.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Multi-AZ and Read Replicas in RDS?

Multi-AZ provides high availability with synchronous replication and automatic failover; the standby is not accessible for reads. Read Replicas provide read scaling with asynchronous replication; they can be promoted manually but do not provide automatic failover. For the exam, if the question asks about 'automatic failover' or 'zero data loss', choose Multi-AZ. If it asks about 'offloading read traffic' or 'cross-region DR', choose Read Replicas.

Can I use a Read Replica as a high availability solution?

No, Read Replicas are not designed for high availability. They require manual promotion, which takes time and may result in data loss if the replica is behind. For HA, use Multi-AZ (RDS) or Aurora Replicas (Aurora). The exam will test this distinction: Read Replicas are for scaling reads, not for failover.

How long does Multi-AZ failover take?

Typically 60-120 seconds. The failover time includes detection (2-3 seconds), DNS propagation (TTL 30 seconds), and database crash recovery. Using RDS Proxy can reduce the impact by maintaining connection pools. The exam expects you to know that failover is not instant and that applications should be designed to retry connections.

What happens to Read Replicas if the primary fails?

Read Replicas continue to function as read-only copies, but they will stop receiving updates. You can manually promote a Read Replica to become the new primary. However, there may be data loss depending on replication lag. The exam may ask you to design a DR plan using cross-region Read Replicas with manual promotion.

Can I have Multi-AZ and Read Replicas on the same RDS instance?

Yes, you can have a Multi-AZ primary and create Read Replicas from it. The replicas replicate from the primary (or from the standby in the case of Aurora). This gives you both high availability and read scaling. The exam may present a scenario where you need both features.

Does Multi-AZ replication incur data transfer costs?

No, data transfer between AZs for replication is free. However, data transfer out to the internet or cross-region does incur costs. For cross-region Read Replicas, you pay for data transfer out from the source region to the destination region.

What is the maximum number of Read Replicas for RDS MySQL?

Up to 15 Read Replicas per source for MySQL, MariaDB, PostgreSQL, and Oracle. For SQL Server, the limit is 5. The exam may ask for these limits, especially the 15 number.

Terms Worth Knowing

Ready to put this to the test?

You've just covered RDS and Aurora: Multi-AZ and Read Replicas — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?