SAA-C03Chapter 90 of 189Objective 2.3

Aurora Auto-Scaling Read Replicas

This chapter covers Amazon Aurora Auto-Scaling Read Replicas, a critical feature for building resilient, scalable database architectures on AWS. For the SAA-C03 exam, understanding how Aurora automatically scales read capacity is essential, as questions on this topic appear in approximately 5-10% of exams, often integrated with questions on high availability, disaster recovery, and performance optimization. You will learn the internal mechanisms, configuration options, and common pitfalls to confidently answer exam questions.

25 min read
Intermediate
Updated May 31, 2026

The Auto-Scaling Conveyor Belt Factory

Imagine a factory that processes customer orders. The factory has a main assembly line (the primary DB instance) that handles all incoming orders (writes and reads). As demand spikes, orders pile up. To keep up, the factory can automatically add parallel processing lines (read replicas) that handle only order inspection (read traffic). A supervisor (the Aurora Auto-Scaling service) monitors the queue length (CPU utilization or connections) at each inspection station. When the queue exceeds a threshold (e.g., 70% CPU for 3 minutes), the supervisor activates a new inspection station from a pool of pre-configured equipment (the replica instance). The new station takes 2-3 minutes to power up and connect to the conveyor belt (the cluster endpoint). Once online, orders are automatically routed to the new station via a dynamic load balancer (the reader endpoint). When demand drops and queues remain below a lower threshold (e.g., 30% CPU for 5 minutes), the supervisor deactivates one station, ensuring no idle capacity. The factory never adds more stations than the ceiling (max replicas) and never goes below the floor (min replicas). This automated scaling ensures the factory handles peak loads without manual intervention, while minimizing cost during quiet periods.

How It Actually Works

What is Aurora Auto-Scaling Read Replicas?

Amazon Aurora Auto-Scaling Read Replicas is a feature that automatically adjusts the number of Aurora Replicas (read replicas) in an Aurora DB cluster based on changes in read workload. It is part of the Aurora DB cluster architecture, which consists of a primary instance (for writes and reads) and up to 15 Aurora Replicas (for read scaling and failover). Auto-scaling eliminates the need for manual intervention to add or remove replicas in response to traffic patterns.

Why It Exists

In traditional database deployments, handling variable read traffic requires either over-provisioning (wasting cost) or manual scaling (risking performance degradation). Aurora Auto-Scaling solves this by dynamically matching read capacity to demand. For the SAA-C03 exam, this is a key pattern for achieving cost-efficient, resilient architectures under Objective 2.3 (Design resilient workloads).

How It Works Internally

Aurora Auto-Scaling uses a target tracking scaling policy based on a predefined metric. The default metric is CPUUtilization of the Aurora Replicas, but you can also use AverageActiveConnections (the average number of active connections to the replicas). The scaling policy aims to keep the average CPU utilization (or connections) across all replicas at a target value (default: 75%).

When the actual metric deviates from the target, the auto-scaling service triggers a scale-out or scale-in action. The scaling is performed by adding or removing Aurora Replicas from the cluster. Each replica is an independent DB instance that shares the same underlying storage volume (the cluster volume). This means replicas have minimal replication lag (typically less than 100ms) and do not incur additional storage costs beyond the primary.

Key Components, Values, Defaults, and Timers

Target Metric: CPUUtilization (default) or AverageActiveConnections. The target value is 75% for CPU utilization and a user-defined value for connections.

Scale-Out Cooldown: 300 seconds (5 minutes) by default. After a scale-out event, the system waits this long before initiating another scale-out, to avoid rapid oscillations.

Scale-In Cooldown: 300 seconds (5 minutes) by default. Prevents rapid scale-in after a scale-out event.

Minimum Replicas: Default is 0. You can set a minimum number of replicas to ensure baseline read capacity.

Maximum Replicas: Default is 15 (the max Aurora Replicas per cluster). You can lower this limit.

Scaling Adjustment: When scaling out, the number of replicas added is determined by the estimated capacity needed to bring the metric to target. Typically, it adds 1 replica at a time (step scaling), but can add more if the deviation is large.

Metric Granularity: CloudWatch metrics are sampled every 1 minute. The auto-scaling algorithm evaluates the average over the last 5 minutes (the evaluation period) before triggering a scaling action.

Configuration and Verification

You can enable auto-scaling via the AWS Management Console, AWS CLI, or CloudFormation. In the console, you create a scaling policy for the Aurora DB cluster under the "Auto Scaling" tab. Using the CLI:

aws application-autoscaling register-scalable-target \
    --service-namespace rds \
    --resource-id cluster:my-aurora-cluster \
    --scalable-dimension rds:cluster:ReadReplicaCount \
    --min-capacity 1 \
    --max-capacity 10

Then create a target tracking scaling policy:

aws application-autoscaling put-scaling-policy \
    --service-namespace rds \
    --resource-id cluster:my-aurora-cluster \
    --scalable-dimension rds:cluster:ReadReplicaCount \
    --policy-name cpu-target-tracking \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration \
        TargetValue=75.0,PredefinedMetricSpecification='{PredefinedMetricType=RDSReaderAverageCPUUtilization}',ScaleOutCooldown=300,ScaleInCooldown=300

To verify, check CloudWatch metrics for the cluster's ReadReplicaCount and the scaling activity in the Application Auto Scaling console.

Interaction with Related Technologies

Aurora Replicas: Auto-scaling manages the count of Aurora Replicas. These are distinct from the primary instance and can be promoted to primary in a failover.

Cluster Endpoint: The writer endpoint always points to the primary instance. The reader endpoint (load-balanced across all replicas) is used to distribute read traffic. Auto-scaling automatically registers new replicas with the reader endpoint.

Custom Endpoints: You can create custom endpoints that point to specific subsets of replicas (e.g., for reporting). Auto-scaling does not manage custom endpoints; you must manually update them.

AWS Auto Scaling: Aurora Auto-Scaling is built on AWS Auto Scaling, which uses CloudWatch alarms and scaling policies.

Storage Auto-Expansion: Aurora's storage automatically grows up to 128 TB per cluster, independent of compute scaling. This is separate from read replica auto-scaling.

Limitations and Edge Cases

Auto-scaling only adjusts the number of replicas, not the instance size. To scale compute capacity, you must manually modify the DB instance class or use Aurora Serverless.

Scaling actions are not instantaneous. Adding a replica takes 2-3 minutes (provisioning and attaching to the cluster). Removing a replica takes a few minutes as well.

If the cluster has custom endpoints, newly added replicas are not automatically included in those endpoints. You must update custom endpoints manually.

Auto-scaling cannot scale to zero replicas if you set minimum capacity > 0. However, you can set minimum to 0, but then the primary handles all read traffic when no replicas exist.

The feature is available for Aurora MySQL and Aurora PostgreSQL, but not for Aurora Serverless v1 (v2 supports it).

Exam Relevance

The SAA-C03 exam tests this feature in the context of designing scalable and resilient architectures. Typical questions involve choosing auto-scaling over manual scaling, understanding when to use CPUUtilization vs. AverageActiveConnections, and recognizing that auto-scaling does not affect the primary instance or storage.

Walk-Through

1

Monitor Replica Load Metric

The auto-scaling service continuously monitors the average CPU utilization (or average active connections) across all Aurora Replicas in the cluster. CloudWatch collects these metrics every minute. The service evaluates the metric over a 5-minute evaluation period to smooth out short spikes. If the average metric exceeds the target value (default 75% CPU), it triggers a scale-out alarm. If it drops below the target for a sustained period (considering scale-in cooldown), it triggers a scale-in alarm.

2

Evaluate Scaling Policy

When the alarm fires, the Application Auto Scaling service evaluates the target tracking policy. It calculates the desired replica count needed to bring the metric back to the target. For example, if CPU is at 90% on 2 replicas, the desired count might be 3 replicas (assuming linear scaling). The policy uses a predefined scaling adjustment that adds a specific number of replicas (usually 1) per step. The service checks the cooldown timer: if a scale-out occurred less than 300 seconds ago, it ignores the alarm.

3

Execute Scale-Out Action

If cooldown has expired, the service calls the RDS API to create a new Aurora Replica. The new replica is provisioned from the cluster's shared storage, so it takes 2-3 minutes to become available. During this time, the replica is in 'creating' state. Once created, it is automatically registered with the cluster's reader endpoint, which uses DNS load balancing to distribute connections across all available replicas. The new replica starts accepting read traffic immediately.

4

Update Reader Endpoint

The reader endpoint is a DNS name that resolves to multiple IP addresses (one per replica). When a new replica is added, the DNS record is updated to include its IP address. Existing connections to other replicas are not affected; new connections are load-balanced across all replicas, including the new one. This ensures even distribution of read traffic without connection draining. The update typically propagates within seconds.

5

Scale-In on Low Load

When CPU utilization drops below the target (e.g., 30%) for a sustained period (evaluation period), and after the scale-in cooldown (300 seconds), the service selects a replica to remove. It chooses one arbitrarily (not based on load). The replica is deleted after draining connections. The reader endpoint is updated to remove the replica's IP. The scale-in continues until the metric stabilizes or the minimum replica count is reached. The scale-in cooldown prevents rapid oscillations.

What This Looks Like on the Job

In a production e-commerce platform, read traffic surges during flash sales. Without auto-scaling, the operations team would need to manually add replicas before each sale, risking either over-provisioning (cost) or under-provisioning (performance). With Aurora Auto-Scaling, the cluster automatically adds replicas when CPU utilization hits 75% for 5 minutes, and removes them when load drops. The team sets a minimum of 2 replicas for baseline traffic and a maximum of 10 to avoid runaway costs.

Another scenario is a SaaS analytics application where customers run ad-hoc queries. The read workload is unpredictable. Auto-scaling with AverageActiveConnections as the metric (target 100 connections per replica) ensures that as more users connect, more replicas are added. The team configures a custom endpoint for reporting queries that points only to a specific subset of replicas, but they must manually update this endpoint when auto-scaling adds replicas—a common pain point.

A common misconfiguration is setting the scale-in cooldown too low (e.g., 60 seconds), causing the system to oscillate: as soon as a replica is added, load drops, triggering immediate scale-in, then load spikes again. The default 300 seconds prevents this. Another issue is forgetting to set minimum replicas: if set to 0, during low traffic all replicas may be removed, and the primary must handle all reads, potentially causing performance issues if write-heavy. In production, a minimum of 1-2 replicas is recommended.

Performance considerations: Adding replicas takes 2-3 minutes, so sudden spikes may still cause temporary latency. To mitigate, you can pre-warm replicas or use predictive scaling (not yet available for Aurora). Also, auto-scaling does not change instance size—if the primary is undersized, read replicas won't help with write throughput. The team must monitor primary CPU and connections separately.

How SAA-C03 Actually Tests This

For the SAA-C03 exam, this topic falls under Domain 2: Resilient Architectures, Objective 2.3: Design resilient workloads. The exam tests your ability to choose the right scaling solution for read-heavy workloads. Key points to remember:

1.

Metric choice: The default metric is CPUUtilization (target 75%). You can also use AverageActiveConnections. The exam may ask which metric to use for connection-based scaling. Correct: AverageActiveConnections. Wrong: MemoryUtilization or DatabaseConnections (which is not a predefined metric).

2.

Cooldown values: Scale-out cooldown = 300 seconds (5 minutes). Scale-in cooldown = 300 seconds. The exam might present a scenario where a candidate sets a 60-second cooldown and asks why scaling is unstable. Answer: Too short cooldown causes oscillation.

3.

Minimum and maximum replicas: Default min=0, max=15. You can set these limits. The exam might ask: 'What happens if you set min=2 and max=2?' Answer: Auto-scaling is effectively disabled; the replica count stays at 2.

4.

Reader endpoint vs. custom endpoint: Auto-scaling automatically updates the reader endpoint. Custom endpoints are NOT automatically updated. This is a common exam trap: a candidate assumes custom endpoints update automatically.

5.

Storage vs. compute scaling: Auto-scaling only scales the number of replicas (compute), not storage. Storage auto-expands separately. The exam may ask: 'How do you scale storage for an Aurora cluster?' Answer: It scales automatically up to 128 TB; no manual action needed.

6.

Failover behavior: If the primary fails, an Aurora Replica is promoted. Auto-scaling will then add a new replica to maintain the desired count. The exam tests that auto-scaling works even after failover.

7.

Wrong answer patterns:

Choosing to use Auto Scaling groups for RDS (incorrect; Aurora uses Application Auto Scaling).

Suggesting that auto-scaling applies to the primary instance (false; only to replicas).

Assuming auto-scaling works with Aurora Serverless v1 (false; only provisioned Aurora and Aurora Serverless v2).

Thinking that scaling out adds storage capacity (false; storage is shared and auto-expands).

8.

Eliminate wrong answers: If a question asks about scaling read capacity, eliminate options that mention modifying the primary instance, using Multi-AZ (which is for high availability, not read scaling), or using DynamoDB. Focus on the reader endpoint and replica count.

Key Takeaways

Aurora Auto-Scaling Read Replicas adjusts the number of Aurora Replicas based on CPUUtilization (default target 75%) or AverageActiveConnections.

Scale-out and scale-in cooldowns are both 300 seconds (5 minutes) by default.

The feature only scales the replica count, not the instance size or storage.

New replicas are automatically registered with the cluster's reader endpoint but not with custom endpoints.

Minimum and maximum replica limits can be set (default min=0, max=15).

Auto-scaling is available for provisioned Aurora (MySQL and PostgreSQL) and Aurora Serverless v2, but not Serverless v1.

Storage auto-expands up to 128 TB independently of replica auto-scaling.

After a failover, auto-scaling will create a new replica to maintain the desired count.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Aurora Auto-Scaling Read Replicas

Automatically adds/removes replicas based on load.

Uses target tracking policy with CPUUtilization or AverageActiveConnections.

Reduces operational overhead; no manual monitoring needed.

Includes cooldown timers (300s) to prevent oscillation.

Cannot scale instance size; only replica count.

Manual Replica Management

Requires manual creation/deletion of replicas via console or CLI.

No automated metric analysis; relies on human judgment.

Higher risk of over-provisioning (cost) or under-provisioning (performance).

No built-in cooldown; human can make rapid changes.

Allows manual selection of instance class for each replica.

Aurora Auto-Scaling (Replica Count)

Scales number of replicas (read capacity) only.

Requires provisioned instances; you pay for allocated compute.

Supports both Aurora MySQL and PostgreSQL.

Replicas share cluster volume; minimal replication lag.

Best for predictable read-heavy workloads.

Aurora Serverless (Compute Scaling)

Scales compute capacity of the primary instance (and replicas) automatically.

Pay per ACU (Aurora Capacity Unit) based on usage; can scale to zero.

Available for Aurora MySQL and PostgreSQL (v2).

Storage is separate; compute scales independently.

Best for variable or intermittent workloads with low traffic.

Watch Out for These

Mistake

Aurora Auto-Scaling Read Replicas can scale the primary instance.

Correct

False. Auto-scaling only adjusts the number of read replicas (Aurora Replicas). The primary instance must be scaled manually or by using Aurora Serverless.

Mistake

Auto-scaling works immediately when load spikes.

Correct

No. There is a 2-3 minute delay to provision a new replica, plus cooldown timers (300 seconds) between scaling actions. The evaluation period is 5 minutes, so scaling is not instant.

Mistake

Custom endpoints are automatically updated when auto-scaling adds replicas.

Correct

False. Only the default reader endpoint is automatically updated. Custom endpoints must be manually modified to include new replicas.

Mistake

The default scaling metric is AverageActiveConnections.

Correct

The default metric is CPUUtilization with a target of 75%. AverageActiveConnections is an alternative but not the default.

Mistake

Auto-scaling can scale down to 0 replicas without any impact.

Correct

Yes, if min capacity is 0, auto-scaling can remove all replicas. However, the primary instance then handles all read traffic, which can degrade performance if the primary is busy with writes.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What metrics can be used for Aurora Auto-Scaling Read Replicas?

The two predefined metrics are RDSReaderAverageCPUUtilization (default, target 75%) and RDSReaderAverageDatabaseConnections. You can also use custom CloudWatch metrics, but that is rare. The exam focuses on CPUUtilization and AverageActiveConnections.

How long does it take to add a new Aurora Replica via auto-scaling?

Provisioning a new replica typically takes 2-3 minutes. However, the scaling action is only triggered after the evaluation period (5 minutes) and cooldown (5 minutes), so total time from load spike to new replica online can be 10-13 minutes.

Does auto-scaling work with Aurora Serverless?

Aurora Serverless v1 does not support auto-scaling read replicas. Aurora Serverless v2 supports auto-scaling for replicas (using Application Auto Scaling). For Serverless v1, scaling is handled by the serverless compute scaling, not by adding replicas.

What happens to custom endpoints when auto-scaling adds replicas?

Custom endpoints are not automatically updated. You must manually add the new replicas to the custom endpoint definition. Only the default reader endpoint is automatically updated.

Can I set different scaling policies for different times of day?

No, Aurora Auto-Scaling supports only a single target tracking scaling policy per cluster. You cannot schedule different policies. For time-based scaling, you would need to use scheduled scaling (not natively supported for Aurora) or manual intervention.

What is the difference between scale-out cooldown and scale-in cooldown?

Scale-out cooldown (300s) prevents another scale-out from occurring immediately after a scale-out. Scale-in cooldown (300s) prevents a scale-in from occurring right after a scale-in. Both are designed to avoid oscillation. They are independent; a scale-in can happen during scale-out cooldown if the policy allows (but typically the cooldown prevents rapid opposite actions).

Does auto-scaling affect the writer endpoint?

No. The writer endpoint always points to the primary instance. Auto-scaling only manages read replicas, so the writer endpoint is unaffected.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Aurora Auto-Scaling Read Replicas — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?