AZ-900Chapter 66 of 127Objective 2.2

Azure VM Scale Sets

This chapter covers Azure Virtual Machine Scale Sets (VMSS), a core Azure compute service that enables you to deploy and manage a set of identical, auto-scaling VMs. Understanding VMSS is critical for the AZ-900 exam, as it tests your knowledge of scalability, high availability, and cost optimization—three pillars of the Azure Well-Architected Framework. This objective area (Azure Architecture Services) typically accounts for about 15-20% of the exam questions, and VMSS is a frequent topic within it. By the end of this chapter, you'll know exactly what VMSS is, how it works, and how to answer exam questions confidently.

25 min read
Intermediate
Updated May 31, 2026

The Concert Ticket Queue System

Imagine you run a popular concert venue and tickets go on sale at 10:00 AM. You have a single ticket booth that can serve one customer every 30 seconds. That works fine for a small show, but for a major artist, thousands of fans arrive simultaneously. Your single booth creates a massive line, customers get frustrated, and many leave without buying. To solve this, you set up a dynamic queue system: as soon as the line grows beyond 10 people, you automatically open additional booths (up to 20 total) and assign staff to them. When the line shrinks below 5, you close excess booths to save labor costs. This is exactly how Azure VM Scale Sets work. You define a 'scale-out' rule (e.g., CPU > 75% for 5 minutes) and a 'scale-in' rule (e.g., CPU < 30% for 5 minutes). Azure automatically adds or removes VM instances (the booths) to match demand (the queue length). You don't have to manually hire staff or open booths—the system handles it. Just as you pay staff only when they are working, you pay for VM instances only when they are running. The key mechanism: Azure monitors a metric (like CPU or queue depth) and triggers autoscale operations that add or remove identical VM instances from a load balancer, ensuring even distribution of traffic.

How It Actually Works

What is Azure VM Scale Sets and the Business Problem It Solves

Azure Virtual Machine Scale Sets (VMSS) is a service that lets you create and manage a group of load-balanced, identical VMs. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule. The core business problem it solves is the need for elastic scalability without manual intervention. For example, an e-commerce website experiences 10x traffic during Black Friday but runs at low utilization the rest of the year. Without VMSS, you would either over-provision (paying for idle VMs) or under-provision (losing sales due to poor performance). VMSS automates the scaling so you only pay for what you use.

How It Works – Step-by-Step Mechanism

1.

Definition: You define a scale set configuration, including the VM image (e.g., Windows Server 2022 or Ubuntu 22.04), VM size (e.g., Standard_D2s_v3), networking settings, and a load balancer or application gateway.

2.

Instance Creation: Azure creates the specified number of initial VM instances (e.g., 2). Each instance is identical and placed behind the load balancer.

3.

Autoscale Rules: You define rules based on metrics like CPU percentage, memory usage, or custom metrics (e.g., queue depth). Each rule has a condition (e.g., CPU > 75% for 5 minutes) and an action (e.g., increase count by 1 or by 20%).

4.

Scaling Operation: When a scale-out condition is met, Azure provisions new VM instances using the same configuration. The new instances automatically register with the load balancer and start receiving traffic. When a scale-in condition is met, Azure deallocates and removes instances, typically the newest ones first (though you can configure deallocation policy).

5.

Load Balancing: The load balancer distributes incoming traffic across all healthy VM instances. Health probes check each instance; if an instance fails, it is removed from rotation and a new one is created.

Key Components, Tiers, and Pricing Models

- Components: - Virtual Machines: Identical instances based on a VM image. - Load Balancer: Azure Load Balancer (Layer 4) or Application Gateway (Layer 7) distributes traffic. - Autoscale Settings: Rules, schedules, and profiles defined in the Azure portal, CLI, or ARM templates. - Health Probes: TCP or HTTP/HTTPS probes that determine instance health. - Instance Metadata Service: Provides information about the instance to applications inside the VM. - Orchestration Modes: - Uniform: VMs are identical and use a VMSS profile. Best for large-scale stateless workloads. - Flexible: Allows mixing of VMs with different sizes and configurations, and supports availability zones. Newer and more versatile. - Pricing: You pay only for the underlying VM instances (compute, storage, networking). No additional charge for the scale set itself. Reserved Instances and Azure Hybrid Benefit apply to individual instances.

Comparison to On-Premises Equivalent

On-premises, scaling requires purchasing, racking, and configuring physical servers—a process that takes weeks. You must over-provision to handle peak demand, leading to low utilization. With VMSS, scaling is automated and happens in minutes. You can scale out during peak hours and scale in during off-peak, dramatically reducing costs. On-premises also lacks the ability to use health probes and automated replacement of failed instances without manual intervention.

Azure Portal and CLI Touchpoints

Portal: Navigate to 'Virtual Machine Scale Sets' and click 'Create'. You configure basics (subscription, resource group, name, region), orchestration mode, image, size, autoscale settings, networking, and load balancing.

CLI: Use az vmss create to create a scale set. Example:

az vmss create \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --image Ubuntu2204 \
  --upgrade-policy-mode automatic \
  --admin-username azureuser \
  --generate-ssh-keys

Autoscale Configuration: Use az monitor autoscale commands to create rules. Example:

az monitor autoscale create \
  --resource-group myResourceGroup \
  --resource myScaleSet \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name autoscale \
  --min-count 2 \
  --max-count 10 \
  --count 2
az monitor autoscale rule create \
  --resource-group myResourceGroup \
  --autoscale-name autoscale \
  --scale out \
  --condition "Percentage CPU > 75 avg 5m" \
  --scale-increase 1

ARM/Bicep: You can deploy VMSS declaratively. A Bicep snippet:

resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2023-03-01' = {
  name: 'myScaleSet'
  location: resourceGroup().location
  sku: {
    name: 'Standard_D2s_v3'
    tier: 'Standard'
    capacity: 2
  }
  properties: {
    upgradePolicy: {
      mode: 'Automatic'
    }
    virtualMachineProfile: {
      storageProfile: {
        imageReference: {
          publisher: 'Canonical'
          offer: '0001-com-ubuntu-server-jammy'
          sku: '22_04-lts-gen2'
          version: 'latest'
        }
      }
      osProfile: {
        computerNamePrefix: 'vmss'
        adminUsername: 'azureuser'
        linuxConfiguration: {
          ssh: {
            publicKeys: [
              {
                path: '/home/azureuser/.ssh/authorized_keys'
                keyData: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC...'
              }
            ]
          }
        }
      }
      networkProfile: {
        networkInterfaceConfigurations: [
          {
            name: 'nic'
            properties: {
              primary: true
              ipConfigurations: [
                {
                  name: 'ipconfig'
                  properties: {
                    subnet: {
                      id: subnet.id
                    }
                    loadBalancerBackendAddressPools: [
                      {
                        id: lb.backendAddressPools[0].id
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Concrete Business Scenarios

E-commerce Flash Sale: A retailer expects a 500% traffic spike during a 2-hour flash sale. They configure VMSS with a scale-out rule at CPU > 60% and a scale-in rule at CPU < 20%. During the sale, instances automatically increase from 5 to 30, handling the load. After the sale, they scale back to 2 to save costs.

Batch Processing: A financial services firm runs nightly risk calculations. They use VMSS with a scheduled profile that scales out to 50 instances at 9 PM and scales in to 0 at 6 AM. They use low-priority VMs (Spot instances) to reduce costs by up to 90%.

Web Application with Variable Traffic: A news website has traffic spikes during breaking news. They use VMSS with Application Gateway for SSL offloading and path-based routing. Autoscale rules use custom metrics like requests per second.

Key Limitations and Edge Cases

Stateless Assumption: VMSS is designed for stateless workloads. If you need to persist state, use external storage like Azure Files or Azure SQL Database. Instance state is lost on scale-in.

Scaling Cooldown: After a scaling operation, there is a cooldown period (default 5 minutes) before another scaling action can occur. This prevents flapping.

Instance Names: In Uniform mode, VM names are generated with a prefix and a number (e.g., myScaleSet_0, myScaleSet_1). They are not customizable.

Upgrade Policy: Three modes: Automatic (instances updated immediately), Rolling (batches updated with health check), and Manual (you trigger updates). Automatic can cause downtime if not designed properly.

Maximum Instances: Up to 1,000 instances per scale set (with certain limits per region). For larger needs, use multiple scale sets.

How VMSS Integrates with Other Azure Services

Azure Load Balancer: Distributes traffic at Layer 4. Supports health probes and NAT rules.

Application Gateway: Layer 7 load balancer with features like SSL termination, URL-based routing, and Web Application Firewall (WAF).

Azure Monitor: Collects metrics for autoscale rules. You can use built-in metrics (CPU, memory) or custom metrics from Application Insights.

Azure Key Vault: Store secrets like database connection strings that VMs can access via managed identity.

Azure Virtual Network: Scale sets are deployed into a subnet. Each instance gets a private IP. Public IPs can be assigned via the load balancer.

Azure Storage: Use managed disks for OS and data disks. You can attach data disks to instances, but they are not persistent across scale-in/out unless you detach them.

Security Considerations

Network Security Groups (NSGs): Apply at the subnet level to control inbound/outbound traffic to scale set instances.

Managed Identities: Assign a managed identity to the scale set so VMs can authenticate to Azure services without credentials.

Update Management: Use Azure Update Management to apply patches to all instances without downtime.

Just-in-Time (JIT) Access: Enable JIT VM access to reduce attack surface.

Performance and Cost Optimization

Right-Sizing: Choose the smallest VM size that meets your workload requirements. Use Azure Advisor recommendations.

Spot Instances: For interruptible workloads (batch, dev/test), use Spot instances to save up to 90%.

Reserved Instances: Pre-purchase one or three-year terms for baseline capacity to save up to 72%.

Scaling Profiles: Define multiple profiles for different times of day or days of week. For example, scale out to 10 during business hours, scale in to 2 at night.

Instance Protection: Protect specific instances from scale-in to avoid terminating critical VMs (e.g., those running long jobs).

Common Mistakes and How to Avoid Them

Not Setting Scale-In Rules: Some users only set scale-out rules, leading to runaway costs. Always define scale-in rules.

Incorrect Metric Thresholds: Setting thresholds too low causes frequent scaling (flapping). Use appropriate aggregation and cooldown periods.

Ignoring Health Probes: Without health probes, the load balancer sends traffic to unhealthy instances, causing errors. Always configure probes.

Using Uniform Mode for Stateful Workloads: Uniform mode assumes all instances are identical and stateless. For stateful, use Flexible mode or external storage.

Azure VMSS vs. Other Compute Options

VMSS vs. Availability Sets: Availability sets protect against hardware failures within a datacenter but do not provide autoscaling. VMSS provides both high availability and elasticity.

VMSS vs. Azure App Service: App Service is a PaaS offering that abstracts the VM layer. It autoscales automatically but offers less control over the OS and configuration. VMSS gives full control (IaaS).

VMSS vs. Azure Kubernetes Service (AKS): AKS manages containers, not VMs. If you containerize your app, AKS provides autoscaling at the pod level. VMSS is for VM-based workloads.

VMSS vs. Azure Functions: Functions are serverless and scale automatically per invocation. VMSS is for long-running or stateful applications that need VMs.

Summary

Azure VM Scale Sets provide elastic, automated scaling of identical VMs to handle variable workloads. They are cost-effective, highly available, and integrate with load balancers, monitoring, and other Azure services. On the AZ-900 exam, focus on the purpose of VMSS, autoscale rules, orchestration modes, and the difference between scale sets and other compute options.

Walk-Through

1

Plan Your Scale Set Configuration

Before creating a scale set, decide on the region, resource group, and name. Choose the orchestration mode: Uniform (identical VMs, simpler management) or Flexible (supports mixed sizes, availability zones, and single-instance placement). Select a VM image (Windows Server, Ubuntu, etc.) and VM size (e.g., Standard_D2s_v3). Determine the number of initial instances (e.g., 2) and the minimum/maximum instance count (e.g., min 2, max 10). Also decide on the upgrade policy: Automatic (instances updated immediately), Rolling (batched with health checks), or Manual (you trigger updates). This planning phase is critical because some settings (like orchestration mode) cannot be changed after creation.

2

Create the Scale Set via Portal or CLI

In the Azure portal, navigate to 'Virtual Machine Scale Sets' and click 'Create'. Fill in the basics: subscription, resource group, name, region, and orchestration mode. Under 'Instance details', select the image, VM size, and admin credentials. Configure networking: choose a virtual network and subnet, and create or select a load balancer. You can also add a public IP and configure health probes. Review and create. Using the CLI, run 'az vmss create' with parameters like --image, --admin-username, --generate-ssh-keys. Azure then provisions the initial VMs, load balancer, and networking. Behind the scenes, Azure creates the VMs in the background, assigns private IPs, and registers them with the load balancer's backend pool.

3

Configure Autoscale Rules

After creation, go to the 'Scaling' blade of your scale set. You can define rules based on metrics like CPU percentage, memory, or custom metrics. Each rule includes: metric source, metric name, time aggregation (avg, min, max), operator (greater than, less than), threshold (e.g., 75), duration (e.g., 5 minutes), and action (increase count by 1 or by percentage). You also set instance limits (minimum, maximum, default). For example, add a rule to scale out if CPU > 75% for 5 minutes, increase by 1 instance. Add another rule to scale in if CPU < 30% for 5 minutes, decrease by 1. You can also create scheduled profiles for predictable load patterns, like scaling out to 20 instances every weekday at 9 AM.

4

Test Autoscaling and Monitor Performance

To test, generate load on the application (e.g., using Apache Bench or a simple script that hits the load balancer's public IP). Monitor the 'Metrics' tab under the scale set to see CPU usage, instance count, and scaling operations. Ensure that scale-out triggers when the threshold is exceeded and that new instances appear in the load balancer's backend pool. Check that scale-in works when load decreases. Use Azure Monitor to set up alerts for scaling events. If autoscale is not working as expected, verify that the metric is being collected, the rule conditions are correct, and there are no cooldown periods preventing further actions.

5

Manage Instances and Perform Updates

You can manually scale the number of instances by updating the 'capacity' property. For rolling updates, change the VM configuration (e.g., new image version) and set the upgrade policy to Rolling. Specify the batch size (e.g., 20%) and pause time between batches. Azure will update instances in batches, ensuring that a minimum number remain healthy. Use instance protection to prevent specific instances from being scaled in (e.g., when they run long jobs). To deallocate or delete instances, you can manually stop or delete them, but autoscale will replace them if it detects a deficit. For permanent removal, reduce the capacity setting.

What This Looks Like on the Job

Scenario 1: E-commerce Platform with Variable Traffic

An online retailer uses Azure VM Scale Sets to handle seasonal traffic spikes. Their application is a stateless web app running on IIS on Windows Server. They configure a scale set with a minimum of 2 and maximum of 20 instances. Autoscale rules: scale out by 2 instances when CPU exceeds 70% for 10 minutes, scale in by 1 when CPU drops below 30% for 10 minutes. They use Azure Load Balancer with a health probe on port 80. During Black Friday, traffic increases 8x, and the scale set automatically grows to 20 instances. The load balancer distributes requests evenly, and the site remains responsive. After the sale, instances scale back to 2, saving costs. A common mistake is setting the cooldown period too short (default 5 minutes), causing flapping. They set it to 10 minutes to avoid this. Also, they use Azure SQL Database for session state, ensuring statelessness.

Scenario 2: Batch Processing with Spot Instances

A media company transcodes thousands of video files daily. They use VMSS with Spot instances to reduce costs by 80%. The scale set is configured with a minimum of 0 and maximum of 100 instances. They use a custom metric (number of files in the queue) from Azure Queue Storage. When the queue depth exceeds 100, they scale out by 10 instances. When it drops below 10, they scale in by 5. Since Spot instances can be evicted, they design the application to checkpoint progress to Azure Blob Storage. If an instance is evicted, the next instance picks up from the last checkpoint. They also use a scheduled profile to ensure at least 5 instances run during business hours for immediate processing. A pitfall is not handling evictions gracefully, leading to data loss. They implement a shutdown script that saves state before deallocation.

Scenario 3: Microservices with Flexible Orchestration

A SaaS company adopts a microservices architecture using containers on VMs. They use VMSS with Flexible orchestration to deploy different VM sizes for different services (e.g., compute-intensive services on F-series, memory-intensive on E-series). They use Application Gateway for path-based routing (e.g., /api/* to one backend pool, /web/* to another). Each backend pool is a separate scale set with its own autoscale rules. They also use availability zones across three zones for resilience. During a denial-of-service attack, one zone becomes unavailable, but the other zones continue serving traffic. The scale set automatically replaces instances in the failed zone. A common issue is misconfiguring the health probe, causing the load balancer to mark healthy instances as unhealthy. They use a custom probe that checks a health endpoint returning 200 OK. They also set the upgrade policy to Rolling to update instances without downtime.

How AZ-900 Actually Tests This

Exactly What AZ-900 Tests on This Objective

Objective 2.2: 'Describe Azure compute services' includes VM Scale Sets. The exam focuses on:

The purpose of VMSS: providing autoscaling and load balancing for a group of identical VMs.

The difference between Uniform and Flexible orchestration modes.

Autoscale rules: conditions, thresholds, actions.

Integration with load balancer and health probes.

Use cases: stateless web apps, batch processing, large-scale compute.

Pricing: pay only for underlying VMs; no extra cost for scale set.

Common Wrong Answers and Why Candidates Choose Them

1.

'VMSS provides high availability by distributing VMs across availability zones.' This is partially true but not the primary purpose. The primary purpose is autoscaling. High availability is a benefit but not the defining feature. Candidates confuse VMSS with Availability Sets.

2.

'VMSS can only be used with Windows VMs.' False. VMSS supports both Windows and Linux images. Candidates may assume because many examples use Windows.

3.

'You must manually add new VMs to the load balancer.' False. New instances automatically register with the load balancer. Candidates think of manual configuration.

4.

'VMSS is a PaaS service.' False. VMSS is IaaS because you manage the VMs. Candidates confuse it with App Service (PaaS).

Specific Terms and Values That Appear Verbatim

'Autoscale', 'scale out', 'scale in', 'instance count', 'minimum', 'maximum', 'default'.

'Upgrade policy': Automatic, Rolling, Manual.

'Orchestration mode': Uniform, Flexible.

'Health probe': TCP or HTTP.

'Cooldown period': 5 minutes default.

'Maximum instances per scale set': 1,000 (with some limits).

Edge Cases and Tricky Distinctions

Stateless vs. Stateful: VMSS assumes stateless workloads. If an application stores session data in memory, scaling in will lose that data. The exam may test that you must use external storage (like Redis Cache or SQL Database) for state.

Spot Instances: You can use Spot instances to save costs, but they can be evicted. The exam may ask about cost optimization vs. reliability trade-offs.

Flexible Orchestration: Newer on exam. It supports mixing VM sizes and availability zones, but requires more configuration. Uniform is simpler.

Scaling Based on Schedule: You can scale based on time (e.g., scale out at 9 AM). The exam may ask about scheduled vs. metric-based scaling.

Memory Trick for Eliminating Wrong Answers

Use the 'AHA' decision tree: - Autoscale: Does the question mention scaling based on demand? If yes, think VMSS. - Health: Does it mention automatic replacement of failed instances? VMSS does this via health probes. - Application: Is the workload stateless and web-based? VMSS is ideal. If the question mentions manual scaling, single VM, or stateful applications, eliminate VMSS.

Key Takeaways

VM Scale Sets provide automatic scaling of identical VMs based on demand or schedule.

Two orchestration modes: Uniform (identical VMs) and Flexible (mixed sizes, zones).

Autoscale rules use metrics like CPU, memory, or custom metrics with thresholds and cooldown periods.

New instances automatically register with the load balancer; health probes detect failures.

Pricing: pay only for the underlying VMs; no additional charge for the scale set itself.

Maximum 1,000 instances per scale set (with some regional limits).

VMSS is IaaS, not PaaS; you manage the OS and applications.

Use Spot instances for cost savings on interruptible workloads.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure VM Scale Sets (Uniform)

All VMs identical in size and configuration

Simpler to manage with a single profile

Supports up to 1,000 instances

Does not support mixing VM sizes

Best for large-scale stateless workloads

Azure VM Scale Sets (Flexible)

Supports different VM sizes and configurations

Requires more manual configuration

Supports up to 1,000 instances per scale set

Supports availability zones and single-instance placement

Best for stateful workloads or microservices

Watch Out for These

Mistake

VM Scale Sets are only for large-scale applications.

Correct

VMSS can be used for any scale, from 2 to 1,000 instances. Small applications benefit from high availability and cost savings through autoscaling.

Mistake

You cannot use VMSS with Linux VMs.

Correct

VMSS supports both Windows and Linux images. The Azure Marketplace offers many Linux distributions, including Ubuntu, CentOS, and Red Hat.

Mistake

VMSS automatically provides high availability across regions.

Correct

VMSS provides high availability within a region via availability zones or fault domains. Cross-region disaster recovery requires additional services like Azure Traffic Manager.

Mistake

You must configure scaling rules for every metric.

Correct

You can use default rules that scale based on CPU or memory. Custom metrics are optional. The exam tests that you can use built-in metrics.

Mistake

VMSS instances are always created in the same availability zone.

Correct

With Flexible orchestration, you can spread instances across multiple availability zones. Uniform mode can also be zone-redundant if configured.

Frequently Asked Questions

What is the difference between Azure VM Scale Sets and Availability Sets?

Availability Sets protect against hardware failures within a datacenter by distributing VMs across fault domains and update domains, but they do not provide autoscaling. VMSS provides both high availability (via fault domains) and autoscaling. Availability Sets are for a fixed number of VMs; VMSS can automatically increase or decrease the number. On the exam, if the question mentions autoscaling, choose VMSS.

Can I use VMSS for stateful applications?

Yes, but with caution. VMSS assumes instances are stateless; scaling in may terminate instances holding state. To use stateful applications, store state externally (e.g., Azure SQL Database, Redis Cache) or use Flexible orchestration with instance protection to prevent scale-in of specific instances. The exam expects you to know that VMSS is designed for stateless workloads.

How do I update all VMs in a scale set without downtime?

Use Rolling upgrade policy. Set the upgrade policy to Rolling, specify batch size (e.g., 20%), and pause time. Azure updates instances in batches, ensuring a minimum number remain healthy. Alternatively, use Manual upgrade and update instances one by one. Automatic upgrade updates all at once, which can cause downtime.

What is the maximum number of VMs in a scale set?

Up to 1,000 instances per scale set. However, there are regional limits on total vCPUs. For larger deployments, you can create multiple scale sets behind a load balancer. The exam may test this limit.

Do I need a load balancer with VMSS?

Not strictly, but it is highly recommended. Without a load balancer, each VM gets its own public IP, which is not practical for scaling. A load balancer provides a single endpoint and distributes traffic. You can also use Application Gateway for Layer 7 routing.

Can I mix Windows and Linux VMs in the same scale set?

In Uniform mode, no. All VMs must use the same image. In Flexible mode, you can mix, but it requires separate configurations. Typically, you use separate scale sets for different OS families.

How does VMSS pricing work?

You pay only for the compute, storage, and networking resources consumed by the VM instances. There is no additional charge for the scale set itself. You can reduce costs with Reserved Instances, Spot instances, and Azure Hybrid Benefit.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure VM Scale Sets — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Done with this chapter?