CLF-C02Chapter 40 of 130Objective 1.4

High Availability on AWS

This chapter covers high availability on AWS, a core concept tested in Domain 1: Cloud Concepts (Objective 1.4) of the CLF-C02 exam. High availability is about ensuring your applications remain accessible and performant even when failures occur. This objective carries approximately 10-15% of the exam weight. Understanding how AWS services like Elastic Load Balancing, Auto Scaling, and multi-AZ deployments work is critical for designing resilient architectures. We'll explore the mechanisms behind these services, key configurations, pricing models, and common exam traps.

25 min read
Intermediate
Updated May 31, 2026

The Reliable Fleet Manager

Imagine you run a global delivery company. You have a fleet of trucks that must operate 24/7. A single truck breaking down could delay thousands of packages. To ensure high availability, you don't rely on one truck per route. Instead, you have a pool of trucks, and if one breaks down, another takes over instantly. You also have spare trucks in different garages (availability zones) so that if a whole garage is flooded, deliveries continue from another garage. Additionally, you have a central dispatcher (like an Elastic Load Balancer) that monitors each truck's health and routes packages only to healthy trucks. You also use automated health checks: if a truck fails to report its location, the dispatcher marks it as unhealthy and stops sending packages to it. The fleet manager also scales up: during holiday peak, you automatically add more trucks from a reserve pool (Auto Scaling). This is exactly how AWS achieves high availability—redundancy across multiple data centers (Availability Zones), automatic health monitoring, load balancing, and elastic scaling to handle demand spikes without manual intervention.

How It Actually Works

What is High Availability and Why Does It Matter?

High availability (HA) refers to systems that are continuously operational for a long period of time. In AWS, HA is achieved by designing architectures that tolerate failures at various levels—from a single EC2 instance to an entire Availability Zone (AZ). The goal is to minimize downtime and ensure that the user experience is unaffected by underlying infrastructure issues. On the CLF-C02 exam, you need to understand that HA is not just about redundancy; it's about automatic detection and recovery from failures without manual intervention.

The Problem: Single Points of Failure

In traditional on-premises data centers, a single server, network switch, or power supply can bring down an entire application. AWS eliminates these single points of failure by providing multiple, isolated data centers called Availability Zones within each Region. Each AZ has independent power, cooling, and networking. By deploying your application across two or more AZs, you can survive the loss of an entire AZ.

How AWS Achieves High Availability

AWS offers several services that work together to provide HA:

Elastic Load Balancing (ELB): Distributes incoming traffic across multiple targets (EC2 instances, containers, IP addresses) in one or more AZs. ELB performs health checks on targets and only routes traffic to healthy ones. If a target fails, ELB stops sending traffic to it and redistributes load to remaining healthy targets.

Auto Scaling: Automatically adjusts the number of EC2 instances based on demand or a schedule. You define launch configurations or templates, scaling policies, and health checks. If an instance becomes unhealthy, Auto Scaling terminates it and launches a new one. Combined with ELB, this creates a self-healing architecture.

Amazon RDS Multi-AZ: For relational databases, RDS Multi-AZ provides a synchronous standby replica in a different AZ. If the primary instance fails, RDS automatically fails over to the standby, typically within 60-120 seconds. This is transparent to the application.

Amazon Route 53: A DNS service that can route traffic to multiple endpoints and perform health checks. You can configure failover routing: if the primary endpoint fails, Route 53 directs traffic to a secondary endpoint, possibly in another Region.

Key Tiers, Configurations, and Pricing

ELB Types: Application Load Balancer (ALB) for HTTP/HTTPS traffic, Network Load Balancer (NLB) for TCP/UDP traffic, and Classic Load Balancer (legacy). ALB and NLB are recommended. Pricing is based on per-hour usage and per-GB of data processed.

Auto Scaling: No additional cost for Auto Scaling itself; you pay only for the resources (EC2 instances, EBS volumes) launched. You can use simple scaling, step scaling, or target tracking scaling policies.

RDS Multi-AZ: Costs about double the single-AZ price because you pay for both the primary and standby instances. However, the standby is not accessible for reads unless you use RDS Multi-AZ with two readable standbys (newer feature).

Route 53: Pricing based on hosted zones, queries, and health checks. Failover routing is a routing policy, not a separate service.

Comparison to On-Premises

In an on-premises environment, achieving HA requires purchasing duplicate hardware, setting up clustering software, and manually failing over. This involves significant capital expenditure and operational overhead. AWS abstracts this complexity: you simply configure services via the console or API. However, you must still design your architecture for HA—for example, placing EC2 instances in at least two AZs, using an ELB, and enabling Auto Scaling. AWS provides the building blocks, but you must assemble them correctly.

When to Use High Availability vs. Disaster Recovery

High availability focuses on minimizing downtime within a single Region by using multiple AZs. Disaster recovery (DR) focuses on recovering from a Region-wide outage by replicating data and applications to another Region. For the CLF-C02 exam, know that HA is about local redundancy (within a Region), while DR is about geographic redundancy (across Regions). Examples: Multi-AZ RDS for HA vs. cross-Region RDS read replicas for DR.

Common Misconfigurations

Single AZ deployment: Launching all instances in one AZ. If that AZ fails, the application goes down.

No health checks: ELB or Auto Scaling without health checks will continue to send traffic to unhealthy instances.

Fixed number of instances: Not using Auto Scaling means if an instance fails, it is not automatically replaced.

Not using ELB: Directly connecting users to instance IPs; if an instance fails, users cannot access the application until DNS updates propagate.

Walk-Through

1

Launch EC2 instances in multiple AZs

Begin by launching at least two EC2 instances, each in a different Availability Zone within the same Region. This ensures that if one AZ experiences an outage, the other instance(s) remain available. You can use the AWS Management Console, CLI, or CloudFormation. For example, using the AWS CLI: `aws ec2 run-instances --image-id ami-0abcdef1234567890 --count 2 --placement AvailabilityZone=us-east-1a` and similarly for us-east-1b. By default, instances are launched in a default VPC. Ensure that security groups allow traffic from the load balancer.

2

Create an Application Load Balancer

In the EC2 console, create an ALB. Choose internet-facing or internal scheme. Select at least two subnets in different AZs to make the load balancer itself highly available. Configure a listener on port 80 or 443. Define a target group and register the EC2 instances. Set health check parameters: protocol (HTTP/HTTPS), path (e.g., /health), interval (default 30 seconds), and threshold (e.g., 2 consecutive failures mark unhealthy). The ALB will then distribute incoming traffic across healthy instances only.

3

Configure Auto Scaling group

Create an Auto Scaling group using a launch template or configuration. Specify the same subnets (AZs) as the ALB. Set desired capacity (e.g., 2), minimum (2), and maximum (e.g., 10). Attach the ALB target group. Enable ELB health checks so that Auto Scaling replaces instances that fail the ALB health check. You can also add scaling policies: target tracking (e.g., keep average CPU at 50%), step scaling, or scheduled scaling. Auto Scaling will launch new instances in any of the specified AZs to maintain the desired count.

4

Enable Multi-AZ for RDS

If your application uses a relational database, enable Multi-AZ when creating or modifying an RDS instance. In the console, under 'Availability & durability', select 'Create a standby instance in a different AZ'. This provisions a synchronous standby in another AZ. During a failure, RDS automatically flips the CNAME to point to the standby. The failover typically completes within 1-2 minutes. Note that Multi-AZ is not available for the db.t2.micro free tier instance. Pricing is approximately 2x the single-AZ cost.

5

Test failover and monitor

After setting up the architecture, simulate a failure to verify HA. For example, stop an EC2 instance manually. The ALB health check should detect the failure and stop sending traffic to it. Auto Scaling will launch a replacement instance. For RDS, you can trigger a failover using the AWS CLI: `aws rds failover-db-instance --db-instance-identifier mydb`. Monitor CloudWatch alarms for metrics like CPUUtilization, HealthyHostCount, and DatabaseConnections. Set up SNS notifications for state changes. This ensures you are alerted to any issues.

What This Looks Like on the Job

Scenario 1: E-commerce Website An online retailer runs its website on EC2 instances behind an ALB with Auto Scaling. During Black Friday, traffic spikes 10x. Auto Scaling launches additional instances automatically, and the ALB distributes load evenly. If an instance fails due to a software bug, the ALB health check detects it and routes traffic away. Auto Scaling terminates the unhealthy instance and launches a new one. The website remains available. Cost: The retailer pays only for the extra instances during the spike. Without Auto Scaling, they would have to over-provision or risk downtime.

Scenario 2: Financial Services Application A bank uses RDS Multi-AZ for its transaction database. When the primary database instance in us-east-1a experiences a hardware failure, RDS automatically fails over to the standby in us-east-1b within 60 seconds. The application reconnects and continues processing transactions. The bank also uses Route 53 failover routing to redirect users to a static error page in another Region if the entire us-east-1 Region becomes unavailable. Misconfiguration: Initially, the bank did not enable Multi-AZ, and a single AZ outage caused 4 hours of downtime. After enabling Multi-AZ, they achieved 99.99% availability.

Scenario 3: Media Streaming Platform A video streaming service uses Amazon CloudFront for content delivery and EC2 for backend encoding. They deploy encoding instances in three AZs with an NLB for UDP-based streaming protocols. Auto Scaling is configured with a step scaling policy to add instances when CPU exceeds 70%. During a popular live event, traffic surges. Auto Scaling adds instances, but because the scaling policy was set to add only 1 instance per step, it could not keep up. The platform experienced buffering. After adjusting the step scaling to add 5 instances per step and using target tracking, the platform handled the load smoothly. This highlights the importance of correct scaling policy configuration.

How CLF-C02 Actually Tests This

What CLF-C02 Tests The exam tests your understanding of high availability concepts within Domain 1: Cloud Concepts (Objective 1.4). You will be asked to identify which AWS services provide HA, how they work, and how to design HA architectures. Specific topics include: multi-AZ deployments, ELB health checks, Auto Scaling, RDS Multi-AZ, and Route 53 failover routing. You will NOT be asked to configure these services in depth, but you must know their purpose and behavior.

Common Wrong Answers and Why Candidates Choose Them 1. 'High availability is achieved by using a single large instance.' Candidates think a more powerful instance is more reliable, but HA requires redundancy, not size. 2. 'Auto Scaling alone ensures high availability.' Auto Scaling replaces failed instances, but without a load balancer, traffic is not redirected. The exam often tests that ELB + Auto Scaling together provide HA. 3. 'Multi-AZ RDS provides read replicas for scaling reads.' Multi-AZ is for HA, not read scaling. Read replicas are for read offloading and are in a different AZ but not synchronous. 4. 'Route 53 simply routes traffic; it doesn't provide HA.' Route 53 with health checks and failover routing does provide HA by redirecting traffic away from unhealthy endpoints.

Specific Terms and Values - ELB health check interval: default 30 seconds - RDS Multi-AZ failover time: typically 60-120 seconds - Auto Scaling cooldown period: default 300 seconds - Number of AZs: minimum 2 for HA - Route 53 failover routing: primary and secondary records

Tricky Distinctions - ELB vs. Route 53: ELB distributes traffic within a Region; Route 53 distributes traffic across Regions or to multiple endpoints globally. - Auto Scaling vs. ELB: Auto Scaling manages instance count; ELB manages traffic distribution. Both are needed for HA. - Multi-AZ RDS vs. Read Replicas: Multi-AZ is for HA with synchronous replication; read replicas are for read scalability with asynchronous replication.

Decision Rule for Multiple Choice When asked 'How do you achieve high availability for an EC2-based application?' look for the answer that includes both multiple AZs and a load balancer. If the question mentions database HA, look for Multi-AZ RDS. Eliminate answers that rely on a single instance or single AZ.

Key Takeaways

High availability on AWS requires redundancy across at least two Availability Zones (AZs).

Elastic Load Balancing (ELB) distributes traffic and performs health checks; it must be used with Auto Scaling for automatic recovery.

Auto Scaling automatically adjusts the number of EC2 instances based on demand and replaces unhealthy instances.

Amazon RDS Multi-AZ provides a synchronous standby in another AZ for automatic failover (typically 60-120 seconds).

Route 53 with health checks and failover routing provides cross-Region high availability.

The AWS shared responsibility model: AWS is responsible for the infrastructure; you are responsible for configuring HA.

Common exam trap: confusing Multi-AZ RDS (HA) with Read Replicas (read scaling).

ELB health check interval default is 30 seconds; unhealthy threshold is typically 2 consecutive failures.

Auto Scaling cooldown period default is 300 seconds to prevent rapid scaling loops.

For HA, always design for failure: assume an AZ will fail and test your architecture.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Elastic Load Balancer (ELB)

Distributes traffic across targets (EC2, containers) within a Region.

Performs health checks on targets and stops routing to unhealthy ones.

Works at the transport or application layer (NLB/ALB).

Requires targets to be in the same Region.

Provides a single DNS endpoint for the application.

Amazon Route 53

DNS service that routes traffic to endpoints globally (across Regions).

Can perform health checks on endpoints and failover to a secondary endpoint.

Works at the DNS level (Layer 7).

Can route to any internet-accessible endpoint, including on-premises.

Provides domain registration and traffic flow management.

Auto Scaling

Automatically adjusts capacity based on policies (CPU, schedule).

Replaces unhealthy instances automatically.

Can integrate with ELB for health checks.

Reduces operational overhead.

Supports predictive scaling based on historical data.

Manual Scaling

Requires manual intervention to launch or terminate instances.

No automatic replacement of failed instances.

No integration with health checks.

Higher risk of human error.

May lead to over-provisioning or under-provisioning.

Multi-AZ RDS

Provides synchronous replication to a standby in another AZ.

Automatic failover if primary fails.

Standby is not accessible for reads.

Costs about 2x single-AZ instance.

Used for high availability.

RDS Read Replicas

Provides asynchronous replication to one or more read replicas.

No automatic failover; manual promotion required.

Read replicas can be used for read traffic.

Costs additional for each replica.

Used for read scalability and disaster recovery.

Watch Out for These

Mistake

High availability means 100% uptime.

Correct

No AWS service guarantees 100% uptime. HA aims for 99.99% or 99.999% uptime, but failures can still occur. AWS SLA for EC2 is 99.99% for multi-AZ deployments.

Mistake

Auto Scaling alone provides high availability.

Correct

Auto Scaling replaces failed instances but does not reroute traffic. A load balancer is required to distribute traffic only to healthy instances. Without ELB, traffic may still go to a failed instance until DNS updates.

Mistake

Multi-AZ RDS allows read scaling.

Correct

Multi-AZ RDS standby is not used for read traffic; it's only for failover. For read scaling, use RDS Read Replicas. Multi-AZ provides HA, not performance.

Mistake

All AWS services are highly available by default.

Correct

No, you must configure HA. For example, a single EC2 instance is not HA. You must deploy across multiple AZs, use ELB, and enable Auto Scaling.

Mistake

Placing instances in the same AZ but different subnets provides HA.

Correct

An AZ is a single data center. If that AZ fails, all instances in it fail. HA requires multiple AZs.

Frequently Asked Questions

What is the difference between high availability and fault tolerance?

High availability (HA) means the system is designed to minimize downtime, typically achieving 99.99% uptime. Fault tolerance means the system continues to operate without interruption even when components fail. On AWS, HA often involves automatic failover (e.g., Multi-AZ RDS), while fault tolerance might use active-active setups with multiple instances handling traffic simultaneously. The CLF-C02 exam focuses on HA concepts like multi-AZ deployments and load balancing.

Do I need to use both ELB and Auto Scaling for high availability?

Yes, for EC2-based applications, you need both. ELB distributes traffic only to healthy instances, while Auto Scaling replaces failed instances and adjusts capacity. Without ELB, traffic may still be sent to a failed instance until DNS updates. Without Auto Scaling, a failed instance is not replaced. The exam often tests this combination.

Can I achieve high availability with a single EC2 instance?

No. A single instance is a single point of failure. If the instance fails, the application goes down. High availability requires at least two instances in different Availability Zones behind a load balancer. The exam will test that redundancy is key.

What is the cost of Multi-AZ RDS compared to single-AZ?

Multi-AZ RDS costs approximately twice the price of a single-AZ deployment because you pay for both the primary and standby instances. However, the standby is not accessible for reads unless you use the Multi-AZ with two readable standbys feature. The exam may ask about cost implications of HA.

How does Route 53 provide high availability?

Route 53 offers routing policies like failover, latency-based, and geolocation. With failover routing, you create primary and secondary records with health checks. If the primary endpoint fails health check, Route 53 automatically returns the secondary record. This provides cross-Region HA. The exam tests that Route 53 can be part of an HA strategy.

What is the default health check interval for an Application Load Balancer?

The default health check interval is 30 seconds. You can configure it between 5 and 300 seconds. The unhealthy threshold is 2 consecutive failures by default. The exam may ask about these defaults.

Can Auto Scaling work without a load balancer?

Yes, Auto Scaling can work without a load balancer, but then it only manages instance count and health checks based on EC2 status checks. Without a load balancer, traffic must be distributed by another mechanism (e.g., Route 53). However, for HA, using an ELB is recommended. The exam expects you to know that ELB + Auto Scaling is a common pattern.

Terms Worth Knowing

Ready to put this to the test?

You've just covered High Availability on AWS — now see how well it sticks with free CLF-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?