This chapter covers Azure Virtual Machine Scale Sets (VMSS), a critical compute resource for deploying and managing identical, auto-scaling VMs. VMSS is a core topic in the Azure Administrator AZ-104 exam under Domain 3.1 (Compute), appearing in approximately 10-15% of questions. Understanding VMSS is essential for designing resilient, scalable applications in Azure, and the exam tests both conceptual knowledge and hands-on configuration details.
Jump to a section
Imagine a rubber band that can stretch or shrink as needed. You have a template for a paper airplane (the VM image). Instead of folding each airplane by hand when demand spikes, you pre-tie knots along the rubber band at equal intervals. Each knot represents a potential VM instance. When more airplanes are needed, you pull the band to stretch it, and the knots automatically become fully folded airplanes (VMs) using the template. When demand drops, you release the band, and the knots collapse back into simple knots (VMs are deallocated). The rubber band itself is the scale set—it manages the spacing (scaling rules) and ensures all airplanes are identical. A load balancer is like a fan that blows the airplanes evenly to different delivery slots. The key is that the rubber band can stretch or shrink without changing the pre-tied knots' positions relative to each other, meaning the scale set maintains a consistent configuration across all instances.
What is Azure Virtual Machine Scale Sets?
Azure Virtual Machine Scale Sets (VMSS) is a compute service that allows you to deploy and manage a set of identical, load-balanced VMs. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule. VMSS is designed for large-scale compute scenarios such as big data, container workloads, and high-performance computing, but it is also used for web applications and microservices.
Why VMSS Exists
Traditional VM deployment requires manual provisioning, configuration, and scaling. VMSS automates the lifecycle of multiple VMs, ensuring they are identical (using a VM image or custom image) and can scale out/in based on metrics (CPU, memory, queue depth) or schedules. It integrates with Azure Load Balancer (ALB) or Application Gateway for traffic distribution, and with Azure Monitor autoscale for dynamic scaling.
How VMSS Works Internally
A VMSS is defined by a scale set configuration that includes: - Virtual machine image: Azure Marketplace image, custom image, or Azure Compute Gallery (formerly Shared Image Gallery) version. - Instance size: VM SKU like Standard_D2s_v3. - Network configuration: Virtual network, subnet, network security group (NSG), and load balancer settings. - Scaling policies: Autoscale rules or manual scaling. - Upgrade policy: Automatic, manual, or rolling upgrades.
When you create a VMSS, you specify an initial instance count (e.g., 2). Azure provisions that many VMs using the specified image and configuration. Each VM gets a unique instance ID, private IP, and optionally a public IP (if using a load balancer with outbound rules). The VMs are placed across fault domains and update domains automatically (for availability sets within the scale set).
Scaling Mechanism
Autoscale rules use metrics from Azure Monitor (e.g., average CPU > 75% for 5 minutes) to trigger scale-out events. When scaling out, Azure creates new VM instances based on the scale set configuration. When scaling in, Azure deletes the oldest instances by default (unless a custom scale-in policy is set). The scale set can also scale to zero instances (not default, but configurable).
Key Components, Values, and Defaults
Overprovisioning: Default is enabled. When enabled, Azure provisions more VMs than requested (e.g., for 100 instances, it may provision 110) to improve reliability. The extra instances are deleted after the desired count is reached. This is only for Marketplace images; custom images do not support overprovisioning.
Upgrade Policy:
Automatic: Instances are updated immediately when the scale set model changes.
Manual: You must manually update each instance.
Rolling: Instances are updated in batches with a pause between batches.
Scale-in Policy: Default is 'Default' (oldest instance first). Can be set to 'NewestVM', 'OldestVM', or 'OldestScaleSet'.
Instance Protection: Can protect instances from scale-in (scale-in protection) or from both scale-in and scale-set updates.
Orchestration Mode: Uniform (identical VMs) or Flexible (mix of VMs and scale sets, allows different VM sizes and images). The exam focuses on Uniform mode.
Single Placement Group: Default is true for smaller scale sets (up to 100 instances). For larger scales (up to 1000), set to false, which spreads VMs across multiple placement groups but disables certain features like availability sets.
Health Monitoring: Can use Azure Load Balancer health probes or Application Health Extension (for guest OS-level health).
Configuration and Verification Commands
Using Azure CLI:
# Create a scale set
az vmss create \
--resource-group myResourceGroup \
--name myScaleSet \
--image UbuntuLTS \
--upgrade-policy-mode automatic \
--admin-username azureuser \
--generate-ssh-keys \
--instance-count 2 \
--vm-sku Standard_DS1_v2
# List instances
az vmss list-instances --resource-group myResourceGroup --name myScaleSet --output table
# Manual scale
az vmss scale --resource-group myResourceGroup --name myScaleSet --new-capacity 5
# Get instance view
az vmss get-instance-view --resource-group myResourceGroup --name myScaleSetUsing PowerShell:
# Create a scale set
$vmssConfig = New-AzVmssConfig -Location 'EastUS' -SkuName 'Standard_DS1_v2' -UpgradePolicyMode 'Automatic'
$vmss = New-AzVmss -ResourceGroupName 'myResourceGroup' -Name 'myScaleSet' -VirtualMachineScaleSet $vmssConfigInteraction with Related Technologies
Azure Load Balancer: VMSS integrates with ALB for distributing traffic. The load balancer's backend pool is automatically updated as instances are added/removed.
Application Gateway: For layer 7 load balancing, VMSS can be the backend pool.
Azure Monitor Autoscale: The autoscale engine reads metrics from Azure Monitor and triggers scale actions based on rules.
Azure Compute Gallery: For managing custom image versions across regions.
Azure Policy: Can enforce compliance on VMSS configurations.
Important Exam Details
VMSS supports both Linux and Windows VMs.
To use custom images, you must have the image in the same region or use Azure Compute Gallery.
The scale set name is used as the prefix for VM names (e.g., myScaleSet_0, myScaleSet_1).
When scaling in, by default the oldest VM is removed. You can change this with the 'scaleInPolicy' property.
VMSS can be deployed in a single availability zone or across multiple zones for higher availability.
The 'eviction policy' for spot instances can be 'Deallocate' or 'Delete'.
To use accelerated networking, enable it in the scale set configuration.
The maximum number of instances per scale set is 1000 (with singlePlacementGroup=false).
Common Commands for Exam
az vmss create: Create a new scale set.
az vmss scale: Manually change instance count.
az vmss update: Update properties (e.g., instance protection).
az vmss delete-instances: Remove specific instances.
az vmss list: List all scale sets.
az vmss get-instance-view: Get detailed status.
Autoscale Rules Example
{
"autoscale": {
"enabled": true,
"profiles": [
{
"name": "Scale out based on CPU",
"capacity": {
"minimum": "1",
"maximum": "10",
"default": "1"
},
"rules": [
{
"metricTrigger": {
"metricName": "Percentage CPU",
"metricResourceUri": "/subscriptions/...",
"timeGrain": "PT1M",
"statistic": "Average",
"timeWindow": "PT5M",
"timeAggregation": "Average",
"operator": "GreaterThan",
"threshold": 75
},
"scaleAction": {
"direction": "Increase",
"type": "ChangeCount",
"value": "1",
"cooldown": "PT5M"
}
}
]
}
]
}
}Upgrade Policies
Automatic: Instances are updated in an unpredictable order. Not recommended for production.
Manual: You must call az vmss update-instances to apply changes.
Rolling: Updates are applied in batches. You can configure max batch size, pause time, and unhealthy instance threshold. This is the recommended policy for production.
Instance Protection
Protect from scale-in: Prevents the instance from being removed during scale-in.
Protect from scale set actions: Also prevents updates from the scale set model.
Health Monitoring
Load Balancer health probe: The load balancer pings a port (e.g., 80) and marks instances as healthy/unhealthy.
Application Health Extension: Deploys an agent on the VM that reports health based on application logic.
Networking
Each VM in a scale set gets a private IP from the subnet.
Public IP can be assigned per instance (not typical) or via load balancer.
NSG can be applied at the subnet or NIC level.
Accelerated networking is supported for certain VM sizes.
Create Scale Set Configuration
Define the VMSS configuration including image, size, network, and scaling parameters. This is done via Azure portal, CLI, PowerShell, or ARM template. The configuration includes the VM image (Marketplace or custom), instance size (e.g., Standard_DS2_v2), admin credentials, VNet/subnet, load balancer settings, and scaling rules. The configuration is stored as a scale set model.
Provision Initial Instances
Azure creates the specified number of VM instances based on the scale set model. For each instance, Azure provisions the VM, attaches the OS disk, configures networking, and applies any extensions. If overprovisioning is enabled, extra instances are created and then deleted to ensure the desired count is met. This step completes when all instances are in 'Running' state.
Configure Autoscale Rules
Define autoscale rules using Azure Monitor metrics. Rules include a metric source (e.g., CPU, queue depth), condition (e.g., average CPU > 75% for 5 minutes), action (increase count by 1), and cooldown (e.g., 5 minutes). Rules can be scheduled for specific times. The autoscale engine evaluates rules every minute.
Scale Out Event
When a metric condition is met (e.g., average CPU > 75% for 5 minutes), Azure triggers a scale-out action. It adds the specified number of instances (e.g., 1) to the scale set. New VMs are provisioned using the same image and configuration. The load balancer automatically adds them to the backend pool. The cooldown period prevents rapid successive scaling.
Scale In Event
When a metric condition is met for scale-in (e.g., average CPU < 25% for 10 minutes), Azure removes instances. By default, the oldest instances are deleted first. You can change the scale-in policy to remove newest or oldest. Instances with scale-in protection are skipped. The load balancer removes them from the backend pool. The cooldown period applies again.
Enterprise Scenario 1: E-commerce Web Application
A large e-commerce company runs its web application on Azure VMSS. During Black Friday, traffic spikes to 10x normal. The scale set is configured with autoscale rules based on CPU and queue depth. At peak, the scale set scales out to 500 instances across multiple availability zones. The load balancer distributes traffic evenly. The application is stateless, with session data stored in Azure Redis Cache. The scale set uses rolling upgrades to deploy new code with zero downtime. Misconfiguration (e.g., cooldown too short) could cause thrashing, where the scale set rapidly scales out and in, incurring unnecessary costs and instability.
Enterprise Scenario 2: Batch Processing Pipeline
A media company processes video files using a VMSS with spot instances to reduce costs. The scale set scales out based on the number of jobs in an Azure Storage queue. Each VM instance pulls a job from the queue, processes it, and writes the result to blob storage. The scale set uses a custom image with pre-installed software. Spot instances can be evicted, so the application is designed to handle interruptions gracefully. The scale set is configured to use a scale-in policy that removes instances running the oldest jobs. The company saves 60-80% compared to on-demand VMs.
Enterprise Scenario 3: Microservices on AKS
A software company runs microservices on Azure Kubernetes Service (AKS), but uses VMSS as the node pool. The VMSS automatically scales based on cluster resource demands. When pods require more resources, the cluster autoscaler increases the VMSS instance count. The scale set uses a custom image optimized for containers. The company uses multiple node pools (VMSS) for different workloads: one for general compute, one for GPU-intensive tasks. Misconfiguring the VM size or scaling policies can lead to underutilization or performance bottlenecks.
Common Pitfalls
Not setting cooldown periods properly, causing thrashing.
Using manual upgrade policy in production, leading to configuration drift.
Forgetting to enable health monitoring, so unhealthy instances continue to receive traffic.
Overprovisioning with custom images (not supported).
Scaling in without protecting critical instances, causing service disruption.
AZ-104 Objective Coverage
This topic falls under Domain 3.1: 'Configure virtual machines and scale sets.' Specific skills measured include:
Create and configure a virtual machine scale set.
Configure autoscale rules.
Configure scaling policies (scale-in, overprovisioning).
Implement instance protection.
Deploy VMSS using Azure Resource Manager templates.
Common Wrong Answers and Why Candidates Choose Them
'VMSS supports only Windows images' – Wrong. VMSS supports both Linux and Windows. Candidates confuse VMSS with other services that have OS restrictions.
'You cannot use custom images with VMSS' – Wrong. You can use custom images via Azure Compute Gallery or by specifying the image ID. Candidates think only Marketplace images are allowed.
'Autoscale rules can only use CPU metrics' – Wrong. Autoscale can use many metrics: CPU, memory, disk queue, HTTP queue length, custom metrics. Candidates remember CPU as the default example.
'Scaling in always removes the newest instance' – Wrong. The default is oldest instance. Candidates may assume 'last in, first out'.
'VMSS can only have up to 100 instances' – Wrong. The default limit is 100 per placement group, but with singlePlacementGroup=false, you can have up to 1000 instances. Candidates confuse the default limit.
Specific Numbers and Values to Memorize
Maximum instances per scale set: 1000 (with singlePlacementGroup=false).
Default overprovisioning: enabled (only for Marketplace images).
Default scale-in policy: 'Default' (oldest instance first).
Cooldown period: minimum 5 minutes (configurable).
Upgrade policy modes: Automatic, Manual, Rolling.
Orchestration modes: Uniform (default for exam) and Flexible.
Availability zones: up to 3 zones.
Instance protection: 'protectFromScaleIn' and 'protectFromScaleSetActions'.
Edge Cases the Exam Loves
Scale to zero: Allowed, but not default. You must set minimum capacity to 0.
Spot instances: Can be used with VMSS. Eviction policy can be Deallocate or Delete.
Accelerated networking: Supported, but must be enabled at creation.
Proximity placement groups: Can be used with VMSS for low latency.
Ephemeral OS disks: Supported for some VM sizes, improves performance.
How to Eliminate Wrong Answers
If a question mentions 'identical VMs', think VMSS.
If it mentions 'autoscaling based on metrics', think VMSS with autoscale.
If it mentions 'rolling upgrades', think VMSS.
If it mentions 'instance protection', think VMSS scale-in protection.
If it mentions 'overprovisioning', think VMSS with Marketplace images.
Eliminate answers that say 'manual scaling only' or 'no load balancing integration'.
VMSS is for deploying and managing identical, auto-scaling VMs.
Maximum 1000 instances per scale set with singlePlacementGroup=false.
Overprovisioning is enabled by default but only for Marketplace images.
Default scale-in policy removes oldest instances first.
Autoscale rules use metrics from Azure Monitor with cooldown minimum 5 minutes.
Upgrade policies: Automatic, Manual, Rolling. Rolling is recommended for production.
Instance protection can prevent scale-in or scale set updates.
VMSS integrates with Azure Load Balancer and Application Gateway.
Spot instances can be used to reduce costs.
Custom images require Azure Compute Gallery or direct image ID reference.
These come up on the exam all the time. Here's how to tell them apart.
VMSS with Uniform Orchestration
All instances are identical (same image, size, configuration).
Supports up to 1000 instances.
Default orchestration mode for VMSS.
Cannot mix different VM sizes or images.
Simpler management for homogeneous workloads.
VMSS with Flexible Orchestration
Allows different VM sizes and images within the same scale set.
Supports up to 1000 instances (similar).
Provides more flexibility for heterogeneous workloads.
Can be used with Azure Virtual Desktop and other services.
Requires more complex management.
Mistake
VMSS can only be created using the Azure portal.
Correct
VMSS can be created using Azure CLI, PowerShell, ARM templates, Terraform, and Azure SDKs. The portal is just one method.
Mistake
All VMSS instances must be in the same availability zone.
Correct
VMSS can be deployed across multiple availability zones for high availability. You specify zones at creation time.
Mistake
Autoscale rules only work with CPU metrics.
Correct
Autoscale supports many metrics: CPU, memory, disk, queue depth, HTTP latency, and custom metrics from Application Insights.
Mistake
You cannot change the VM size after creating a scale set.
Correct
You can update the VM SKU using `az vmss update` or portal, but it requires a reimage or upgrade of instances.
Mistake
VMSS always uses public IPs for each instance.
Correct
By default, instances do not have public IPs. Traffic goes through a load balancer. You can optionally assign public IPs to instances.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
You can create a VMSS with a custom image using the Azure CLI: `az vmss create --image <image-id>`. The image must be in the same region or available via Azure Compute Gallery. Use a managed image or gallery image version. Example: `az vmss create --image /subscriptions/.../resourceGroups/.../providers/Microsoft.Compute/galleries/.../images/.../versions/1.0.0`.
Manual upgrades require you to explicitly call `az vmss update-instances` to apply changes. Rolling upgrades automatically update instances in batches with configurable pause time and batch size. Rolling is better for production because it minimizes downtime. Manual is simpler for testing.
Yes. When creating a VMSS, you can set the 'priority' to 'Spot'. You must also specify an eviction policy: 'Deallocate' (default) or 'Delete'. Spot instances can be evicted at any time, so your application must handle interruptions. They are cost-effective for batch jobs and stateless workloads.
When scaling in, VMSS removes instances based on the scale-in policy. Default is 'Default' which removes the oldest instances first. You can set it to 'NewestVM', 'OldestVM', or 'OldestScaleSet'. Instances with scale-in protection are skipped. The load balancer automatically removes them from the backend pool.
Overprovisioning is a feature that provisions extra VM instances (e.g., 110% of desired count) to improve reliability. The extra instances are deleted after the desired count is reached. It is enabled by default but only works with Marketplace images. It does not work with custom images.
Yes. You can associate a load balancer with a VMSS using the `az vmss update` command or portal. The load balancer's backend pool is automatically updated as instances are added/removed. You can also use Application Gateway.
The maximum is 1000 instances when singlePlacementGroup is set to false. With singlePlacementGroup=true (default), the limit is 100 instances. To scale beyond 100, you must set singlePlacementGroup to false at creation time.
You've just covered Azure Virtual Machine Scale Sets — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.
Done with this chapter?