AZ-900Chapter 7 of 127Objective 1.5

High Availability and Scalability

This chapter covers two foundational concepts for cloud architecture: high availability and scalability. These are critical for designing resilient applications that remain accessible and performant under varying loads. On the AZ-900 exam, this topic falls under 'Describe cloud concepts' (15-20% of the exam), and you can expect 2-3 questions that test your understanding of availability zones, scale sets, SLAs, and the difference between scaling up and scaling out. Mastering these concepts will help you choose the right Azure services to meet business continuity and performance requirements.

25 min read

Beginner

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Look up key terms

The Concert Hall with Backup Band

Imagine you own a concert hall that hosts 1,000 fans every night. High availability is like having a backup generator and a secondary stage in case the main stage has a power outage. If the main stage goes dark, the backup generator kicks in within seconds, and the show goes on with minimal interruption. Scalability is like having the ability to add extra seating sections or even open a second hall when a popular band sells out tickets. You don't want to build a 10,000-seat hall that sits empty most nights; instead, you have movable walls and modular seating that can expand from 1,000 to 5,000 seats as needed. Azure does this with virtual machines and load balancers: high availability uses multiple VMs in an availability set or zone so that if one fails, traffic shifts to another. Scalability uses scale sets that automatically add or remove VMs based on CPU usage or request count. Just as you wouldn't want a concert to stop because a light bulb burned out, Azure ensures your application stays up and can handle sudden spikes in demand without manual intervention.

How It Actually Works

What Are High Availability and Scalability?

High availability (HA) refers to a system's ability to remain operational and accessible for a high percentage of time, typically measured by uptime SLAs. The business problem it solves is downtime: when an application is unavailable, companies lose revenue, productivity, and customer trust. Azure offers multiple layers of HA, from redundant hardware within a datacenter to replicating data across geographically separated regions.

Scalability is the ability of a system to handle growing amounts of work by adding resources. There are two types: vertical scaling (scale up) means increasing the capacity of a single resource (e.g., upgrading a VM from 4 cores to 8 cores), and horizontal scaling (scale out) means adding more instances of a resource (e.g., adding more VMs to a pool). The business problem scalability solves is performance degradation under load: without it, an application becomes slow or crashes during traffic spikes.

How High Availability Works in Azure

Azure achieves high availability through redundancy and automatic failover. Here is the step-by-step mechanism:

Redundancy at the datacenter level: Each Azure datacenter has multiple racks of servers, each with redundant power, cooling, and networking. If one rack fails, others continue.

Availability Sets: Within a datacenter, you can group two or more VMs into an availability set. Azure ensures these VMs are placed on different fault domains (separate racks with independent power and network) and update domains (groups that are updated during maintenance one at a time). If hardware fails or Azure performs maintenance, at least one VM remains running.

Availability Zones: For higher resiliency, you can deploy VMs across multiple availability zones within a region. Each zone is a physically separate datacenter with independent power, cooling, and networking. If one zone goes down, traffic is routed to another zone.

Region Pairs: Azure replicates data (e.g., storage) across two paired regions (e.g., East US and West US) that are at least 300 miles apart. In a disaster, you can failover to the paired region.

Load Balancers and Traffic Manager: These distribute incoming requests across healthy VMs. If a VM fails, the load balancer stops sending traffic to it and redirects to healthy ones.

The SLA (Service Level Agreement) for a single VM is 99.9% uptime. For two or more VMs in an availability set, it's 99.95%. For VMs across availability zones, it's 99.99%.

How Scalability Works in Azure

Scalability in Azure is primarily achieved through Virtual Machine Scale Sets and autoscaling.

Vertical Scaling (Scale Up): You change the VM size to a larger SKU (e.g., from Standard_D2s_v3 to Standard_D4s_v3). This requires a reboot and is not automated. It's useful for applications that are not designed for distributed architectures.

Horizontal Scaling (Scale Out): You add more VM instances. Azure VM Scale Sets allow you to define a set of identical VMs and automatically increase or decrease the number based on metrics like CPU usage, memory, or queue length. This is the preferred method for cloud-native apps because it provides elasticity and fault tolerance.

Autoscale Conditions: You define rules like "if average CPU > 75% for 5 minutes, add 2 instances" and "if CPU < 25% for 10 minutes, remove 1 instance." Azure monitors these metrics and adjusts the instance count.

Manual Scaling: You can also manually change the instance count via the portal, CLI, or PowerShell.

Key Components and Pricing

Availability Set: No additional cost beyond the VMs themselves.

Availability Zones: No extra cost for the zone placement, but inter-zone data transfer may incur charges.

VM Scale Sets: You pay only for the underlying VMs and any associated resources (load balancer, storage). Autoscale is a free feature of Scale Sets.

Load Balancer: Standard Load Balancer has a cost per hour and per rule; Basic is free but limited.

On-Premises Equivalent

On-premises, high availability requires purchasing redundant servers, storage, and networking equipment, plus a secondary site for disaster recovery. This involves high capital expenditure and complex management. Scalability on-premises means buying larger servers (scale up) or adding more servers (scale out) which takes weeks due to procurement and setup. Azure eliminates these delays with pay-as-you-go resources that can be provisioned in minutes.

Azure Portal and CLI Touchpoints

To create an availability set in the portal: Navigate to Virtual Machines -> Create -> Availability Options -> Availability Set. To create a VM Scale Set: Search for 'Scale Set' in the marketplace, configure the VM image, size, and autoscale rules.

CLI commands:

# Create an availability set
az vm availability-set create --name myAvailabilitySet --resource-group myResourceGroup --location eastus

# Create a VM scale set
az vmss create --name myScaleSet --resource-group myResourceGroup --image UbuntuLTS --instance-count 2 --vm-sku Standard_DS1_v2

# Configure autoscale
az monitor autoscale create --resource myScaleSet --resource-group myResourceGroup --resource-type Microsoft.Compute/virtualMachineScaleSets --count 2 --min-count 1 --max-count 10

Walk-Through

Define Availability Requirements

First, determine the required uptime for your application. For example, a critical e-commerce site might need 99.99% uptime, while an internal reporting tool might tolerate 99.9%. This decision drives the architecture: single VM with 99.9% SLA, availability set with 99.95%, or availability zones with 99.99%. Also consider recovery point objective (RPO) and recovery time objective (RTO). In the portal, you would document these in the solution design before provisioning resources.

Create an Availability Set or Use Zones

For high availability within a region, create an availability set and deploy at least two VMs into it. Azure automatically assigns fault domains and update domains. For higher resilience, deploy VMs across availability zones. In the Azure portal, when creating a VM, under 'Availability options', select 'Availability set' or 'Availability zone'. Behind the scenes, Azure allocates resources to ensure physical separation.

Configure Load Balancer

Place a load balancer in front of your VMs to distribute traffic. Create a backend pool containing your VMs, define health probes (e.g., HTTP 200 on port 80), and set load balancing rules. If a VM fails the health probe, the load balancer stops sending traffic to it. You can use Azure Load Balancer (Layer 4) or Application Gateway (Layer 7). In the portal, navigate to Load Balancers, create, and associate with your VMs.

Set Up Autoscaling for Scale Sets

For scalability, create a Virtual Machine Scale Set. Define the VM image, size, and initial instance count. Then configure autoscale rules based on metrics like CPU percentage, memory, or custom metrics. For example, scale out when average CPU > 75% for 5 minutes, and scale in when CPU < 25% for 10 minutes. Set minimum and maximum instance limits to control costs. Azure Monitor handles the metrics and triggers scaling actions.

Test Failover and Scale Events

Simulate failures by stopping a VM or disconnecting a network interface. Verify that the load balancer redirects traffic to remaining healthy VMs. Test autoscaling by generating load (e.g., using Apache Bench) and confirm new instances are created. Monitor the process in Azure Monitor and check the scale set's instance count. This validates that your HA and scalability configurations work as expected.

What This Looks Like on the Job

Scenario 1: E-commerce Website During Black Friday An online retailer expects a 10x traffic spike on Black Friday. They deploy their web app across three availability zones in Azure, with a load balancer and a VM scale set configured to autoscale from 5 to 100 VMs based on CPU usage. The team sets aggressive scale-out rules (add 5 VMs if CPU > 70% for 2 minutes) and conservative scale-in rules (remove 2 VMs if CPU < 30% for 15 minutes) to handle rapid spikes without thrashing. Cost is a concern, so they use reserved instances for the baseline 5 VMs and pay-as-you-go for burst capacity. Without autoscaling, the site would crash or become painfully slow, losing millions in revenue.

Scenario 2: Mission-Critical Financial Trading Platform A financial firm requires 99.999% uptime (less than 5 minutes downtime per year). They deploy their application across two Azure regions paired for disaster recovery (e.g., East US and West US). Within each region, they use availability zones. Traffic Manager routes users to the nearest healthy region. If a region fails, Traffic Manager automatically fails over to the other region. The team also uses Azure SQL Database with active geo-replication for data redundancy. Cost is high because they maintain full capacity in both regions, but the cost of downtime (millions per minute) justifies it. A common mistake is not testing failover regularly; once, a misconfigured DNS TTL caused a 10-minute outage during a drill.

Scenario 3: Video Streaming Service A video streaming platform needs to handle variable viewer counts during live events. They use VM scale sets for transcoding workers. When a live stream starts, the number of viewers spikes, and autoscale adds transcoding VMs. The team uses custom metrics like queue depth of video chunks to trigger scaling. They also use Azure CDN to cache content at edge locations, reducing load on origin servers. A pitfall is setting scale-in rules too aggressively, causing VMs to be removed while still processing requests, leading to dropped frames. They learned to use a cooldown period of 10 minutes after scale-in.

How AZ-900 Actually Tests This

Objective Code: AZ-900: Describe cloud concepts -> Describe the benefits of high availability and scalability

The exam tests your ability to define high availability and scalability, identify scenarios where each is needed, and understand how Azure implements them. You will NOT be asked to configure anything — only to recognize concepts and SLAs.

Common Wrong Answers and Why Candidates Choose Them 1. "High availability means the application can handle more users." This is actually scalability. Candidates confuse the two because both involve redundancy. HA is about uptime; scalability is about capacity. 2. "Availability zones provide the same redundancy as an availability set." Availability zones are physically separate datacenters; availability sets are within one datacenter. The exam may ask which provides higher SLA. 3. "Scaling up is always better than scaling out." Scaling up can reach a limit and causes downtime during resizing. Scaling out is more elastic and fault-tolerant. The exam expects you to know the trade-offs. 4. "You need to manually add VMs to scale." With autoscale, it's automatic. The exam might describe a manual process as a wrong answer.

Specific Terms and Values to Memorize - SLA for a single VM: 99.9% - SLA for two+ VMs in an availability set: 99.95% - SLA for VMs across availability zones: 99.99% - Maximum number of fault domains per availability set: 3 (by default) - Maximum number of update domains per availability set: 20 (default 5) - Scale set can scale from 1 to 1000 instances (or more with larger quotas)

Edge Cases - If a VM is in an availability set but only one VM is deployed, the SLA is still 99.9% because the set doesn't guarantee redundancy with a single VM. - Availability zones are not available in all regions. Some regions have only one zone. The exam may ask which regions support zones (e.g., East US 2, West Europe). - Autoscale can also scale based on a schedule (e.g., add instances during business hours).

Memory Trick Use the acronym HAS for High Availability and Scalability: High = Uptime (like a high percentage), Availability = Redundancy (multiple copies), Scalability = Capacity (adding resources). For SLA values, remember: 9.9 (single), 9.95 (set), 9.99 (zones) — each adds a 9 or a half 9.

Decision Tree for Exam Questions - If the question mentions "remain operational despite failures" -> High Availability - If it mentions "handle increased demand" -> Scalability - If it mentions "adding more VMs" -> Scale Out - If it mentions "upgrading to a larger VM" -> Scale Up

Key Takeaways

High availability ensures application uptime through redundancy; Azure offers SLAs of 99.9% (single VM), 99.95% (availability set), and 99.99% (availability zones).

Scalability is the ability to handle load changes; vertical scaling (scale up) changes VM size, horizontal scaling (scale out) adds more VM instances.

Virtual Machine Scale Sets enable automatic scaling based on metrics like CPU usage, with configurable min/max instance counts.

Availability sets group VMs across fault domains (up to 3) and update domains (up to 20) within a datacenter.

Availability zones are physically separate datacenters within a region, offering higher resiliency than availability sets.

Load balancers distribute traffic across healthy VMs and are essential for both HA and scalability.

Autoscale rules have cool-down periods to avoid oscillation; scale-out is typically faster than scale-in.

The exam focuses on conceptual understanding and SLA values, not configuration steps.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Availability Set

Protects against rack-level failures within a single datacenter

Provides 99.95% SLA with two or more VMs

No additional data transfer cost

Available in all Azure regions

Uses fault domains and update domains

Availability Zone

Protects against entire datacenter failures

Provides 99.99% SLA with two or more VMs

May incur inter-zone data transfer charges

Not available in all regions (check regional support)

Each zone is a physically separate datacenter

Watch Out for These

Mistake

High availability and scalability are the same thing.

Correct

High availability focuses on uptime and fault tolerance, while scalability focuses on handling increased load by adding resources. They are related but distinct concepts.

Mistake

An availability set guarantees 99.99% uptime.

Correct

An availability set with two or more VMs provides an SLA of 99.95%, not 99.99%. 99.99% requires availability zones.

Mistake

Scaling up is more reliable than scaling out.

Correct

Scaling out is generally more reliable because it distributes risk across multiple instances. Scaling up creates a single point of failure at a larger size.

Mistake

Autoscaling works instantly.

Correct

Autoscaling has a delay: Azure Monitor collects metrics every 1-5 minutes, and provisioning new VMs takes a few minutes. For rapid spikes, you need to over-provision or use predictive autoscale.

Mistake

You need to pay extra for high availability features.

Correct

Availability sets and availability zones themselves are free. You only pay for the underlying VMs and associated resources like load balancers.

Frequently Asked Questions

What is the difference between an availability set and an availability zone?

An availability set protects against failures within a single Azure datacenter by placing VMs on different racks (fault domains) and ensuring maintenance updates don't affect all VMs at once (update domains). An availability zone protects against entire datacenter failures by placing VMs in physically separate datacenters within the same region. Availability zones offer higher SLA (99.99% vs 99.95%) but may incur additional data transfer costs. For the exam, remember that availability sets are within one datacenter, zones span multiple datacenters.

How does autoscaling work in Azure?

Autoscaling in Azure is primarily used with Virtual Machine Scale Sets. You define rules based on metrics like CPU percentage, memory, or custom metrics. Azure Monitor collects these metrics and triggers scaling actions when thresholds are crossed. For example, if average CPU > 75% for 5 minutes, Azure adds a specified number of VMs. You set a minimum and maximum instance count to control cost and performance. Autoscaling can also be scheduled for predictable load patterns. It's a free feature of Scale Sets.

What is the SLA for a single VM in Azure?

The SLA for a single VM in Azure is 99.9% uptime, which allows for about 8.76 hours of downtime per year. If you deploy two or more VMs in an availability set, the SLA increases to 99.95% (about 4.38 hours of downtime per year). For VMs across availability zones, the SLA is 99.99% (about 52.56 minutes of downtime per year). These SLAs apply only if you meet the redundancy requirements (e.g., two or more VMs for availability set).

Can I scale up and scale out at the same time?

Yes, you can combine both approaches. For example, you can scale out by adding more VM instances using a scale set, and also scale up by choosing a larger VM size for the instances. However, scaling up requires redeploying or resizing VMs, which involves downtime. Typically, cloud-native applications prefer scaling out because it's more elastic and fault-tolerant. The exam expects you to know the trade-offs: scale up is simpler but limited; scale out is more complex but provides better resilience.

Is there any cost for using availability sets or availability zones?

Availability sets and availability zones themselves are free features of Azure. You only pay for the underlying resources you deploy, such as VMs, storage, and load balancers. However, using availability zones may incur data transfer charges if traffic moves between zones. For example, if a load balancer sends traffic from one zone to a VM in another zone, you pay for the inter-zone bandwidth. Availability sets have no such charges because all VMs are within the same datacenter.

What is the maximum number of VMs in a scale set?

By default, a Virtual Machine Scale Set can scale from 1 to 1000 instances. However, you can request a quota increase to support up to 6000 instances per scale set (for certain VM sizes). The actual limit depends on the VM size and regional capacity. For the exam, remember the default maximum is 1000. Also note that scale sets support both uniform and flexible orchestration modes, with different scaling characteristics.

How do I choose between scale up and scale out?

Scale up (vertical scaling) is best for applications that are not designed for distributed architectures, such as legacy databases or single-server applications. It's simple but has limits and requires downtime. Scale out (horizontal scaling) is preferred for cloud-native apps that are stateless and can run on multiple instances. It provides better fault tolerance and elasticity. On the exam, if a question mentions 'adding more servers' or 'increasing capacity without downtime', think scale out. If it mentions 'upgrading to a larger server', think scale up.

Terms Worth Knowing

Availability zone FHRP High availability Region

Ready to put this to the test?

You've just covered High Availability and Scalability — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Try AZ-900 practice questions Back to all chapters

Done with this chapter?

Benefits of Cloud Computing

Reliability and Predictability

See the full AZ-900 study guide