AZ-104Chapter 99 of 168Objective 3.2

App Service Autoscaling Rules

This chapter covers Azure App Service autoscaling rules, a critical feature for managing cost and performance in production workloads. Autoscaling is a core topic in the AZ-104 exam, appearing in roughly 10-15% of questions related to Compute (Objective 3.2). You will learn the mechanics of scale rules, profiles, conditions, and best practices to avoid common pitfalls. Mastering autoscaling is essential for passing the exam and for real-world Azure administration.

25 min read
Intermediate
Updated May 31, 2026

The Elastic Elevator Bank

Imagine a busy office building with a bank of 10 elevators. During normal hours, only 3 are active, saving energy. As more people arrive, a sensor counts the waiting crowd. When the queue exceeds 20 people, the system activates an additional elevator within 30 seconds. If the crowd keeps growing, it adds more elevators one by one, up to the maximum of 10. Conversely, when the queue drops below 5 people for 5 minutes, it deactivates one elevator. This mimics Azure App Service autoscaling: a scale rule monitors a metric (like CPU or queue length), has a threshold (e.g., >70% CPU), a duration (e.g., 5 minutes), and a cool-down period (like 10 minutes) to avoid flapping. The elevator bank's minimum (3) and maximum (10) correspond to instance count limits. The sensor's polling interval (e.g., 1 minute) is analogous to the metric collection period. If the building had no cool-down, elevators would turn on and off rapidly—just like autoscaling would oscillate without proper configuration.

How It Actually Works

What is Autoscaling?

Autoscaling is the ability of an Azure App Service Plan (ASP) to automatically increase or decrease the number of virtual machine (VM) instances running your web apps, mobile backends, or API apps based on predefined rules. It is a key feature of the App Service pricing tiers that support scaling out: Basic, Standard, Premium, and Isolated (with certain limits). Autoscaling is not available in the Free or Shared tiers.

The primary purpose is to handle fluctuating loads efficiently. Instead of manually scaling up or down, you define conditions that trigger scale actions. This optimizes both performance (by adding instances during high demand) and cost (by removing unnecessary instances during low demand).

How Autoscaling Works Internally

Azure Autoscale uses the Azure Monitor service to collect metrics from your App Service Plan. Each App Service Plan has a set of metrics collected at the instance level (e.g., CPU Percentage, Memory Percentage, HTTP Queue Length) and aggregated across all instances.

The autoscale engine runs every minute (the default evaluation interval). It evaluates all rules defined in the autoscale settings. Each rule has a metric source, a statistic (e.g., Average, Minimum, Maximum), a threshold (e.g., greater than 70), a duration (e.g., 5 minutes), and a scale direction (increase or decrease) with a count (e.g., 1 instance).

For a scale-out rule to trigger, the metric must exceed the threshold for the entire duration (e.g., CPU > 70% for 5 consecutive minutes). Once triggered, the engine adds the specified number of instances. After the scale action, a cool-down period (default 10 minutes) prevents further scale actions. This cool-down is critical to avoid "flapping"—rapid oscillation between scaling out and in.

Key Components

Autoscale Setting: A collection of profiles, rules, and notifications. Each App Service Plan can have one autoscale setting.

Profiles: A profile defines a set of rules for a specific time window. You can have multiple profiles (e.g., a default profile for weekdays and a specific profile for weekends). Only one profile is active at any given time.

Rules: Each rule has a metric trigger and an action. Rules are evaluated independently. If multiple rules trigger simultaneously, the engine chooses the most aggressive action (e.g., scale out by 2 if one rule says 1 and another says 2).

Metric: The data source for the rule. Common metrics: CPU Percentage, Memory Percentage, HTTP Queue Length, Data In/Out.

Threshold: The value that triggers the rule. For scale-out, it's "greater than" or "greater than or equal to"; for scale-in, it's "less than" or "less than or equal to".

Duration: The time window over which the metric is evaluated. Must be at least 5 minutes for scale-in rules (to prevent premature scale-in).

Cool-down: The time to wait after a scale action before another scale action can occur. Default is 10 minutes, minimum 5 minutes.

Instance Count: Minimum and maximum instance limits. The autoscale engine will never scale below the minimum or above the maximum.

Configuration via Azure Portal and CLI

To configure autoscaling in the portal: 1. Navigate to your App Service Plan. 2. Under "Scale up (App Service plan)", select "Custom autoscale". 3. Add a profile (e.g., default). 4. Add a scale rule: select metric, operator, threshold, duration, cool-down, and action (increase/decrease by count or percentage). 5. Set instance limits.

Using Azure CLI:

az monitor autoscale create --resource-group myRG --resource myPlan --resource-type Microsoft.Web/serverfarms --name myAutoscaleSetting --min-count 1 --max-count 10 --count 2

To add a rule:

az monitor autoscale rule create --autoscale-name myAutoscaleSetting --resource-group myRG --resource myPlan --resource-type Microsoft.Web/serverfarms --scale out --condition "Percentage CPU > 70 avg 5m" --cooldown 10 --direction increase --type ChangeCount --value 1

Interaction with Related Technologies

Autoscaling works closely with Azure Monitor (metrics), App Service Plan (compute resources), and optionally with Azure Application Insights (for custom metrics). It also integrates with Azure Alerts for notifications. If you use Azure Front Door or Traffic Manager, autoscaling can react to global load changes.

Default Values and Timers

Metric evaluation period: 1 minute (cannot be changed)

Minimum duration for scale-in: 5 minutes

Default cool-down: 10 minutes

Maximum instance count: depends on tier (e.g., Standard up to 10, Premium up to 20, Isolated up to 100)

Scale-out by default: 1 instance

Scale-in by default: 1 instance

Best Practices

Always set a minimum and maximum instance count to avoid runaway costs or performance degradation.

Use multiple rules for different metrics (e.g., CPU and queue length).

Ensure cool-down periods are long enough to avoid flapping (at least 10 minutes).

Test autoscaling with load testing tools before production.

Monitor autoscale activity in Azure Monitor logs.

Common Mistakes on the Exam

Confusing autoscaling with scale up (changing tier). Autoscaling is scale out/in (number of instances), not scale up/down (VM size).

Thinking autoscaling can scale to zero. Minimum instance count must be at least 1.

Ignoring cool-down: candidates often think a rule triggers immediately after the threshold is crossed, but the duration must be met first.

Forgetting that scale-in rules require a longer duration (minimum 5 minutes) to prevent premature scale-in.

Walk-Through

1

Define Autoscale Profile

Create a profile that specifies the time range and instance limits. For example, a default profile for weekdays 9 AM to 5 PM with min=2, max=10. Profiles allow different scaling behaviors for different times. Each profile has its own set of rules. Only one profile is active at a time. The autoscale engine evaluates the start and end times of profiles to determine which is active. If no profile matches, the default profile (if any) is used.

2

Configure Scale-Out Rule

Add a rule that triggers when a metric exceeds a threshold for a specified duration. For example, if CPU > 70% for 5 minutes, increase instance count by 1. The metric is averaged over the duration. The rule uses the operator 'Greater than' or 'Greater than or equal to'. The action specifies the direction (increase), type (ChangeCount or PercentChangeCount), and value (e.g., 1). Cool-down is set (default 10 minutes).

3

Configure Scale-In Rule

Add a rule that triggers when a metric drops below a threshold for a longer duration (minimum 5 minutes). For example, if CPU < 30% for 10 minutes, decrease instance count by 1. Scale-in rules must have a duration of at least 5 minutes to prevent premature scaling in. The cool-down also applies. It's common to use a lower threshold and longer duration for scale-in to avoid oscillations.

4

Set Instance Limits

Define the minimum and maximum number of instances. The autoscale engine will never scale below the minimum or above the maximum. These limits are per profile. For example, min=2, max=10. If you set min=2, even if scale-in rules trigger, the instance count will not drop below 2. These limits are crucial for cost control and capacity planning.

5

Enable and Monitor Autoscaling

After configuration, enable autoscaling. The engine starts evaluating metrics every minute. You can monitor autoscale actions in the Azure portal under 'Run history' or via Azure Monitor logs. Common issues to watch: flapping (frequent scaling), rule not triggering (check metric availability), and hitting instance limits. Use alerts to notify on autoscale events.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Website During Holiday Sales

A large e-commerce company runs their website on an Azure App Service Plan (Standard tier). During normal days, traffic is moderate with CPU around 40%. However, during Black Friday, traffic spikes 10x. They configure autoscaling with two profiles: a default profile with min=2, max=10, and a holiday profile for November with min=5, max=20. The scale-out rule triggers when CPU > 70% for 5 minutes, adding 2 instances. Scale-in triggers when CPU < 30% for 10 minutes, removing 1 instance. Cool-down is set to 15 minutes to avoid flapping. During the sale, the app scales from 5 to 20 instances within an hour. Without autoscaling, they would either overprovision (cost) or crash (performance). Common misconfiguration: setting cool-down too low (e.g., 5 minutes) caused constant scaling during load spikes, leading to thrashing and increased costs.

Enterprise Scenario 2: SaaS Application with Variable Workloads

A SaaS provider offers a document processing service. The workload is CPU-intensive but also depends on queue depth (HTTP Queue Length). They configure two scale-out rules: one for CPU > 80% for 5 minutes, and another for Queue Length > 100 for 5 minutes. Both rules add 1 instance. The scale-in rule uses Queue Length < 20 for 10 minutes. They set min=1, max=10. In production, they noticed that CPU spikes were brief, so they reduced the duration to 3 minutes (allowed? Actually, minimum duration is 5 minutes for scale-in, but scale-out can be shorter? The exam says scale-out can be 1 minute? Actually, Azure allows any duration for scale-out, but best practice is 5+ minutes. They set scale-out to 3 minutes, which caused too frequent scaling. They reverted to 5 minutes. This scenario highlights the importance of tuning durations.

Enterprise Scenario 3: Global Media Streaming Platform

A media company uses Azure App Service for a streaming portal. They have users worldwide, so they use multiple regional App Service Plans behind Azure Traffic Manager. Each region has its own autoscale rules based on local CPU and network bandwidth. They also use custom metrics from Application Insights (e.g., request rate). During a major event, one region's traffic surged. Autoscaling scaled out to the maximum (20 instances) but still couldn't handle the load because the backend database was throttled. This shows that autoscaling only scales the compute layer; you must also ensure downstream services can scale. Misconfiguration: they set scale-in rules too aggressive, causing instances to be removed too quickly after the event, leading to a secondary spike. They increased scale-in duration to 30 minutes.

How AZ-104 Actually Tests This

What AZ-104 Tests

The exam objective 3.2 (Configure and manage Azure App Service) includes autoscaling. Specifically, you may be asked to:

Identify the correct configuration for a given scenario (e.g., CPU > 80% for 5 minutes, scale out by 1).

Understand the difference between scale out and scale up.

Know the default cool-down period (10 minutes).

Recognize that autoscaling is configured at the App Service Plan level, not the app level.

Understand that scale-in rules require a longer duration (minimum 5 minutes).

Know the instance limits per tier (e.g., Standard up to 10, Premium up to 20).

Common Wrong Answers

1.

"Autoscaling can scale down to zero instances." This is false. The minimum instance count must be at least 1 (except in some preview features). Candidates choose this because they think of serverless, but App Service autoscaling always requires at least one instance.

2.

"Autoscaling rules trigger immediately when the threshold is crossed." Wrong. The metric must exceed the threshold for the entire duration (e.g., 5 minutes). Candidates often forget the duration requirement.

3.

"Scale-in rules can have the same duration as scale-out rules." False. Scale-in rules require a minimum duration of 5 minutes to prevent premature scale-in. Scale-out rules can have shorter durations (though best practice is 5+ minutes).

4.

"Autoscaling is configured per web app." Incorrect. Autoscaling is configured at the App Service Plan level, which affects all apps in that plan. Candidates confuse this with scaling up (changing tier).

Specific Numbers to Memorize

Default cool-down: 10 minutes

Minimum scale-in duration: 5 minutes

Instance limits: Basic (3), Standard (10), Premium (20), Isolated (100)

Default scale increment: 1 instance

Edge Cases

If multiple rules trigger, the engine applies the most aggressive action (e.g., scale out by 2 if one rule says 1 and another says 2).

If a scale-out and scale-in rule trigger simultaneously, scale-out takes precedence (to avoid performance degradation).

Autoscaling does not guarantee immediate scaling; there is a delay of a few minutes for new instances to become ready.

You can use a schedule-based profile to pre-scale for known events (e.g., scale out before a sale).

Key Takeaways

Autoscaling is configured at the App Service Plan level, not per app.

Scale-in rules require a minimum duration of 5 minutes; scale-out can be shorter.

Default cool-down period is 10 minutes.

Instance limits: Standard (10), Premium (20), Isolated (100).

Autoscaling cannot scale to zero; minimum instance count is 1.

Multiple rules: the most aggressive scale action is applied.

Autoscaling uses Azure Monitor metrics (CPU, queue length, etc.).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Scale Out (Horizontal Scaling)

Increases number of VM instances

Configurable via autoscale rules

Cost increases linearly with instances

Works within the same App Service Plan tier

Can be automated based on metrics

Scale Up (Vertical Scaling)

Increases VM size (e.g., from S1 to S3)

Manual or via script (not autoscaling)

Cost increases non-linearly (higher tier more expensive)

Requires changing the pricing tier

Downtime may occur during scaling

Watch Out for These

Mistake

Autoscaling can scale down to zero instances to save costs.

Correct

Autoscaling requires a minimum of 1 instance. You cannot scale to zero because the app must always be available. The minimum instance count is configurable but must be at least 1.

Mistake

Autoscaling rules trigger immediately when the metric crosses the threshold.

Correct

The metric must exceed the threshold for the entire duration (e.g., 5 minutes). The engine evaluates the metric over that time window. Immediate triggering would cause flapping.

Mistake

Scale-in and scale-out rules can use the same duration.

Correct

Scale-in rules require a minimum duration of 5 minutes to avoid premature scaling in. Scale-out rules can have a shorter duration (e.g., 1 minute), but best practice is 5+ minutes.

Mistake

Autoscaling is configured per web app.

Correct

Autoscaling is configured at the App Service Plan level. All apps in the plan share the same instances. Scaling up/down (changing tier) is per app, but scaling out/in is per plan.

Mistake

The cool-down period resets after each metric evaluation.

Correct

The cool-down period starts after a scale action completes. During cool-down, no further scale actions are taken, even if rules trigger. This prevents flapping.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between autoscaling and scaling up in Azure App Service?

Autoscaling (scale out) increases the number of VM instances running your app, while scaling up changes the VM size (e.g., from Standard S1 to S3). Autoscaling is automated via rules; scaling up is manual. Autoscaling is configured at the App Service Plan level; scaling up changes the plan's tier. Both affect cost and performance.

Can I use autoscaling with the Free or Shared tier?

No. Autoscaling is only available in Basic, Standard, Premium, and Isolated tiers. Free and Shared tiers do not support scaling out.

What metrics can I use for autoscaling rules?

Common metrics include CPU Percentage, Memory Percentage, HTTP Queue Length, Data In/Out, and custom metrics from Application Insights. You can also use other Azure Monitor metrics.

What happens if multiple scale-out rules trigger at the same time?

The autoscale engine applies the most aggressive action. For example, if one rule adds 1 instance and another adds 2, it will add 2 instances. This prevents under-provisioning.

How do I prevent autoscaling from flapping?

Set a cool-down period (default 10 minutes) and ensure scale-in rules have a longer duration (e.g., 10 minutes) than scale-out rules. Also, set appropriate thresholds to avoid frequent crossing.

Can I schedule autoscaling for specific times?

Yes. You can create multiple profiles with different schedules. For example, a profile for weekdays 9-5 with certain rules, and a default profile for other times. This allows pre-scaling for known load patterns.

Is there any delay when autoscaling adds instances?

Yes. After a scale-out action is triggered, it takes a few minutes for new instances to be provisioned and become ready. The exact time depends on the tier and current load.

Terms Worth Knowing

Ready to put this to the test?

You've just covered App Service Autoscaling Rules — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.

Done with this chapter?