This chapter covers cost-aware architecture design patterns in Microsoft Azure, a critical area for the AZ-305 exam. Understanding how to design for cost optimization is essential for the 'Design identity, governance, and monitoring solutions' domain, specifically objective 1.3: 'Design a cost management strategy for Azure resources.' Approximately 10-15% of exam questions touch on cost optimization, including budgeting, tagging, right-sizing, and reserved instances. This chapter provides the technical depth needed to design cost-efficient solutions that align with business requirements.
Jump to a section
Imagine you are managing a family budget across multiple income streams and spending categories. Your family has a total monthly income (the Azure subscription budget). Each family member (department or team) has their own allowance (resource group or subscription) and can spend within limits, but overspending by one member means less for others or dipping into savings (overall budget). The family sets up envelopes (budgets) for groceries, utilities, and entertainment (different resource types). You use a shared credit card (the Azure subscription) and track spending with a budgeting app (Azure Cost Management + Billing). The app sends alerts when any envelope reaches 80% of its limit (budget alerts), and you can set rules to block further spending on non-essential categories when the total approaches the income cap (policies and spending limits). If someone tries to buy an expensive gadget without approval (provisioning a costly VM without tagging), the app flags it and requires a justification (cost anomaly alerts and tagging enforcement). By regularly reviewing the app’s reports (cost analysis), you can reallocate funds from underused categories to where they are needed most (right-sizing and reservations). This systematic approach keeps the family financially healthy and avoids surprise bills at the end of the month.
What Is Cost-Aware Architecture Design?
Cost-aware architecture design is the practice of building cloud solutions that optimize spending without sacrificing performance, security, or reliability. In Azure, this means selecting the right services, sizes, and configurations to match workload demands while using tools like Azure Cost Management + Billing, Azure Advisor, and Azure Policy to monitor and control costs. The AZ-305 exam expects you to recommend cost-optimized designs using specific patterns such as right-sizing, reserved instances, spot VMs, and auto-scaling.
Why Cost Awareness Matters for the Exam
Microsoft defines five pillars of the Azure Well-Architected Framework: Reliability, Security, Cost Optimization, Operational Excellence, and Performance Efficiency. Cost Optimization is the pillar most directly tested in objective 1.3. The exam will present scenarios where you must choose between multiple cost-saving options, such as reserved vs. pay-as-you-go, or standard vs. low-priority VMs. You must also understand how to implement governance policies that prevent cost overruns, such as Azure Policy for allowed SKUs and tagging requirements.
Key Components of Cost Management
Azure Cost Management + Billing: This is the primary tool for monitoring, analyzing, and optimizing Azure costs. It provides: - Cost Analysis: View and filter costs by subscription, resource group, resource type, or tags. You can create custom views and export data to Excel or Power BI. - Budgets: Set spending limits and receive alerts when costs exceed thresholds (e.g., 50%, 80%, 100% of budget). Budgets can trigger automation, such as disabling a resource or sending an email. - Recommendations: Azure Advisor provides cost recommendations, such as right-sizing underutilized VMs or purchasing reserved instances. These recommendations are based on 30 days of usage data. - Cost Alerts: Three types: budget alerts (when spending reaches a threshold), credit alerts (when Azure credits are consumed), and department spending quota alerts (for EA customers).
Azure Policy for Cost Governance: Azure Policy can enforce cost-related rules, such as:
Allowed VM SKUs (e.g., only B-series or Dv3-series)
Required tags (e.g., CostCenter, Environment, Owner)
Deny creation of expensive resources (e.g., GPU VMs) without approval
Append default tags to resources that lack them
Azure Reservations and Savings Plans: These offer significant discounts (up to 72%) compared to pay-as-you-go pricing for one-year or three-year commitments. Reservations apply to specific VM sizes, SQL Database, Cosmos DB, and other services. Savings Plans are more flexible, covering any compute service across regions. The exam tests when to recommend reservations vs. pay-as-you-go based on steady-state usage.
Azure Spot VMs: These are unused compute capacity offered at up to 90% discount, but they can be evicted at any time (with 30-second notice) when Azure needs the capacity back. Ideal for batch processing, dev/test, or stateless workloads that can tolerate interruptions. The exam expects you to know the eviction policy and use cases.
Auto-Scaling: Scale out (add instances) and scale in (remove instances) based on metrics like CPU, memory, or queue length. This ensures you only pay for what you use. The exam tests how to configure autoscale settings and when to use predictive autoscale (based on historical patterns).
How Cost Optimization Patterns Work
Right-Sizing: This involves adjusting VM sizes, storage tiers, and other resources to match actual usage. Azure Advisor analyzes CPU, memory, and disk I/O over 30 days and recommends downsizing or shutting down idle VMs. For example, a VM running at 5% CPU for a month should be downsized from Standard_D4s_v3 to Standard_D2s_v3, saving 50% cost. The exam will ask you to interpret Advisor recommendations.
Reserved Instances (RIs): You commit to a specific VM size in a region for one or three years. The discount applies automatically to matching VMs. There are three payment options: upfront (highest discount), partial upfront, and monthly (lowest discount). The exam may present a scenario where a customer has predictable, steady-state workloads and ask which RI payment option is most cost-effective.
Hybrid Benefit: If you have Windows Server or SQL Server licenses with Software Assurance, you can use them in Azure to reduce costs. This applies to VMs and Azure SQL Database. The exam tests when to enable Hybrid Benefit (e.g., for Windows VMs to save on licensing) and the prerequisites (Software Assurance or subscription licenses).
Azure Budgets and Action Groups: You can create budgets at management group, subscription, or resource group scope. When a budget threshold is exceeded, an action group can send email, SMS, or trigger an automation runbook or webhook. For example, a budget at 80% could send an email to the finance team; at 100%, it could trigger a runbook that shuts down non-critical VMs.
Tagging Strategy: Tags are metadata key-value pairs applied to resources. A consistent tagging strategy enables cost allocation and chargeback. Common tags: CostCenter, Department, Environment (Prod/Dev/Test), Owner, Project. Azure Policy can enforce tagging and inherit tags from resource groups. The exam will ask you to design a tagging strategy for cost reporting.
Configuration and Verification Commands
Using Azure CLI to view costs:
az consumption budget list --subscription <subscription-id>
az consumption usage list --subscription <subscription-id> --start-date 2024-01-01 --end-date 2024-01-31Using Azure Policy to enforce allowed VM SKUs:
{
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Compute/virtualMachines"
},
{
"not": {
"field": "Microsoft.Compute/virtualMachines/sku.name",
"in": ["Standard_B2s", "Standard_D2s_v3"]
}
}
]
},
"then": {
"effect": "deny"
}
}
}Interaction with Related Technologies
Cost-aware design interacts with: - Azure Advisor: Provides cost recommendations that you can implement via policies or manual actions. - Azure Monitor: Metrics and logs feed into autoscale decisions and cost analysis. - Azure Automation: Runbooks can automate cost-saving actions like shutting down VMs during off-hours. - Azure Blueprints: Can include policy assignments and RBAC roles to enforce cost governance at scale.
Default Values and Timers
Advisor cost recommendations: Based on 30 days of usage data; updated every 24 hours.
Budget alerts: Can be set at any percentage (common: 50%, 80%, 90%, 100%).
Spot VM eviction: 30-second notice; eviction policy can be deallocate or delete.
Reserved instance term: 1 or 3 years; payment options: upfront, partial upfront, monthly.
Autoscale cooldown period: Default 5 minutes between scale operations.
Common Exam Traps
Trap 1: Confusing budgets with cost alerts. Budgets track spending against a limit; cost alerts are more general (e.g., credit consumption).
Trap 2: Assuming reserved instances are always cheaper than pay-as-you-go. For short-term or variable workloads, pay-as-you-go or spot VMs may be better.
Trap 3: Overlooking Hybrid Benefit. Many candidates forget that Windows Server and SQL Server licenses can reduce costs.
Trap 4: Ignoring tagging enforcement. The exam expects you to know that tags are not automatically inherited from resource groups; you need Azure Policy to append them.
Assess Current Spending
Begin by analyzing current Azure spending using Azure Cost Management + Billing. Navigate to Cost Analysis and filter by subscription, resource group, or resource type. Identify top spending resources and trends over the last 30-90 days. Use the 'Group by' feature to break down costs by service, location, or tag. Export data to Excel for deeper analysis. Look for anomalies like sudden spikes or consistently underutilized resources. This step establishes a baseline for cost optimization.
Identify Optimization Opportunities
Review Azure Advisor cost recommendations. These include right-sizing underutilized VMs (e.g., a VM with average CPU < 5%), shutting down idle VMs, purchasing reserved instances for steady-state workloads, and converting to spot VMs for interruptible workloads. Also check for unused resources like unattached disks, idle load balancers, and orphaned public IPs. Prioritize recommendations based on potential savings and business impact.
Implement Cost Governance Policies
Use Azure Policy to enforce cost-saving rules. Create policies to restrict allowed VM SKUs to cost-efficient series (e.g., B-series for dev/test, Dv3-series for production). Require tags like CostCenter and Environment on all resources. Use the 'Deny' effect to block creation of expensive resources without proper tags. Assign policies at management group scope for organization-wide enforcement. Also set up Azure budgets with alerts at 50%, 80%, and 100% of budget.
Adopt Reserved Instances and Savings Plans
For workloads with consistent usage, purchase reserved instances or savings plans. Analyze usage patterns to determine the optimal term (1 or 3 years) and payment option (upfront for maximum discount). Apply reservations to the subscription or shared scope. For flexible compute needs, use savings plans that cover any Azure compute service. Monitor utilization of reservations to avoid waste (e.g., unused reserved capacity).
Automate Cost Optimization
Automate cost-saving actions using Azure Automation runbooks, Logic Apps, or Functions. For example, schedule start/stop of non-production VMs during off-hours. Use autoscale to dynamically adjust instance counts based on demand. Set up budget-driven automation: when a budget threshold is exceeded, trigger a runbook to scale down or shut down resources. Also enable Azure Cost Management exports to send data to a storage account for custom reporting.
Enterprise Scenario 1: Global E-Commerce Platform
A large e-commerce company runs hundreds of VMs across multiple regions for its web tier, application tier, and databases. They were spending $500,000/month on Azure. The finance team needed to allocate costs to business units. They implemented a comprehensive tagging strategy with tags like 'CostCenter', 'Environment', and 'Application'. Azure Policy enforced tagging on all new resources. They used Azure Cost Management to create budgets per CostCenter and set alerts at 80% and 100%. They also purchased reserved instances for their steady-state database VMs, saving 40%. For the web tier, they implemented autoscale based on CPU and used spot VMs for batch processing jobs. Within six months, they reduced costs by 30% while maintaining performance.
Enterprise Scenario 2: SaaS Provider with Variable Workloads
A SaaS provider had unpredictable traffic patterns, making reservations risky. They used Azure Savings Plans (1-year term) to get a 20% discount on compute without committing to specific VM sizes. They also used Azure Policy to restrict VM SKUs to the B-series for dev/test and Dv3-series for production. They set up budgets at the subscription level with alerts that triggered an Azure Automation runbook to scale down non-critical resources when spending exceeded 90% of budget. They also enabled Azure Advisor cost recommendations and scheduled a monthly review to right-size VMs. This approach helped them stay within budget while handling traffic spikes.
Common Pitfalls
Misconfigured Tags: Without Azure Policy to enforce tags, resources are created without required tags, making cost allocation impossible. The solution is to use Azure Policy with 'deny' effect for missing tags.
Over-Provisioning Reservations: Buying too many reserved instances leads to waste. Analyze usage data for at least 30 days and use Azure Advisor recommendations to determine the right number.
Ignoring Spot Eviction: Using spot VMs for stateful workloads can cause data loss. Always design for statelessness or use checkpointing.
Not Automating Shutdown: For dev/test environments, forgetting to shut down VMs overnight can double costs. Use Azure Automation schedules or DevTest Labs auto-shutdown.
What AZ-305 Tests on Cost-Aware Architecture
The AZ-305 exam objective 1.3 is 'Design a cost management strategy for Azure resources.' Specifically, you must be able to:
Recommend appropriate cost management tools (Cost Management, Azure Policy, Azure Advisor)
Design a tagging strategy for cost allocation
Choose between pricing models (pay-as-you-go, reserved, spot, savings plans)
Implement budgets and alerts
Automate cost optimization
Common Wrong Answers and Why Candidates Choose Them
'Always use reserved instances to save money.' Wrong because reserved instances are only beneficial for steady-state workloads. For variable workloads, pay-as-you-go or savings plans are better. Candidates overlook the commitment risk.
'Budgets automatically stop spending when exceeded.' Wrong. Budgets only send alerts; they do not block spending unless you configure automation (e.g., runbook to disable resources).
'Tags are automatically inherited from resource groups.' Wrong. Tags are not inherited; you must use Azure Policy to append tags from the resource group.
'Spot VMs are suitable for all workloads.' Wrong. Spot VMs can be evicted, so they are only suitable for interruptible workloads like batch processing or dev/test.
Specific Numbers and Terms That Appear on the Exam
Advisor cost recommendations are based on 30 days of usage data.
Reserved instance discounts: up to 72% for 3-year upfront.
Spot VM discount: up to 90%.
Budget alert thresholds: common at 50%, 80%, 90%, 100%.
Autoscale cooldown: 5 minutes default.
Hybrid Benefit requires Software Assurance or subscription licenses.
Edge Cases and Exceptions
If a workload runs less than 70% of the time, reserved instances may not be cost-effective. Compare the break-even point.
For Azure SQL Database, reserved capacity is per DTU or vCore, not per database. You can share across multiple databases.
Azure Policy 'deny' effect cannot be overridden by subscription admins; only at management group scope.
Budget alerts can be sent to action groups that include email, SMS, webhook, and ITSM connectors.
How to Eliminate Wrong Answers
If the scenario mentions 'predictable, steady-state workload,' lean towards reserved instances.
If the scenario mentions 'unpredictable traffic' or 'interruptible,' consider spot VMs or autoscale.
If the question asks about cost allocation, look for tags and Azure Policy.
If the question involves automating cost response, look for Azure Automation runbooks or Logic Apps triggered by budget alerts.
Always check if Hybrid Benefit applies (Windows Server or SQL Server with Software Assurance).
Use Azure Cost Management + Billing for monitoring, budgets, and alerts.
Azure Advisor provides cost recommendations based on 30 days of usage.
Reserved instances require a 1- or 3-year commitment and are best for steady-state workloads.
Spot VMs offer up to 90% discount but can be evicted; use only for interruptible workloads.
Azure Policy can enforce allowed SKUs and required tags to control costs.
Tags are not inherited; use Azure Policy 'append' effect to enforce tagging.
Automate cost responses using budget alerts triggering Azure Automation runbooks.
Hybrid Benefit reduces Windows Server and SQL Server licensing costs on Azure.
Autoscale with a 5-minute cooldown default helps match capacity to demand.
Savings plans provide flexibility across compute services without per-size commitment.
These come up on the exam all the time. Here's how to tell them apart.
Reserved Instances
Commit to a specific VM size and region for 1 or 3 years.
Discount up to 72% for 3-year upfront.
Requires careful planning to match workload.
Can be shared across subscriptions in the same billing scope.
Best for predictable, steady-state workloads.
Azure Savings Plans
Commit to a hourly spend amount for 1 or 3 years.
Discount up to 65% for 3-year upfront.
Flexible – applies to any compute service across regions.
Easier to manage for variable workloads.
Best for workloads with varying compute needs.
Pay-as-you-go
No upfront commitment; pay per hour.
Full pricing; no discount.
No eviction risk.
Suitable for all workloads.
Simple to start and stop.
Spot VMs
Up to 90% discount.
Can be evicted with 30-second notice.
Only for interruptible workloads (batch, dev/test).
Must be stateless or use checkpointing.
Best for cost savings on flexible jobs.
Mistake
Azure budgets automatically stop spending when the budget is exceeded.
Correct
Budgets only trigger alerts at configured thresholds. They do not block spending. To stop spending, you must configure automation (e.g., an Azure Automation runbook triggered by the alert) to disable resources.
Mistake
Reserved instances are always the most cost-effective option.
Correct
Reserved instances are cost-effective only for steady-state workloads that run 24/7. For variable or short-lived workloads, pay-as-you-go or spot VMs may be cheaper. The break-even point depends on usage patterns.
Mistake
Tags are automatically inherited from the resource group to the resources within it.
Correct
Tags are not inherited. You must use Azure Policy with the 'append' effect to automatically add tags from the resource group to new resources. Existing resources must be updated manually or via a script.
Mistake
Spot VMs are the same as low-priority VMs in Azure Batch.
Correct
Spot VMs are the general Azure offering for unused capacity with eviction risk. Low-priority VMs are specific to Azure Batch and have similar eviction behavior but are used within Batch pools. Both are interruptible.
Mistake
Azure Advisor cost recommendations are based on 7 days of usage data.
Correct
Advisor cost recommendations are based on 30 days of usage data. This ensures a reliable baseline for right-sizing and reservation recommendations.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Create a budget in Azure Cost Management with an alert threshold (e.g., 80%). Configure the alert to trigger an action group that includes a webhook or Azure Automation runbook. The runbook should contain PowerShell or CLI commands to stop or deallocate VMs. For example, use 'Stop-AzVM' in a PowerShell runbook. Ensure the runbook has the necessary permissions (managed identity or service principal). Test the automation in a non-production environment first.
A reserved instance (RI) is a commitment to a specific VM size and region for 1 or 3 years, offering up to 72% discount. It applies only to that exact SKU. A savings plan is a commitment to an hourly spend amount (e.g., $100/hour) for 1 or 3 years, offering up to 65% discount. It applies flexibly to any compute service (VMs, containers, serverless) across regions. RIs are best for predictable, fixed workloads; savings plans for variable or multi-service workloads.
Use Azure Policy with the 'deny' effect to block creation of resources without the required tag. Create a policy definition that checks for the existence of the 'CostCenter' tag on all resource types. Assign the policy at the management group or subscription level. For existing resources, use a remediation task to apply the tag via Azure Policy's 'deployIfNotExists' effect. You can also use the 'append' effect to automatically add the tag from the resource group.
Use Spot VMs for workloads that are interruptible, stateless, and fault-tolerant. Examples: batch processing jobs, dev/test environments, big data analytics, and containerized workloads that can handle eviction. Do not use Spot VMs for production, stateful applications, or workloads that require high availability. Spot VMs offer up to 90% discount but can be evicted with 30-second notice when Azure needs capacity.
In the Azure portal, navigate to 'Azure Advisor' under 'Cost' section. The 'Cost' tab shows recommendations for right-sizing, shutting down idle VMs, purchasing reserved instances, and converting to spot VMs. Each recommendation includes estimated monthly savings, current usage metrics, and a link to implement the change. You can also access Advisor recommendations via Azure CLI: 'az advisor recommendation list --category Cost'.
Hybrid Benefit allows you to use your existing Windows Server or SQL Server licenses with Software Assurance (or subscription licenses) on Azure VMs and Azure SQL Database, reducing the cost of the underlying infrastructure. For Windows VMs, you pay only the Linux base rate (no Windows license cost). For SQL Server, you save on SQL licensing costs. To enable, select 'Yes' for 'Already have a Windows Server license?' when creating a VM, or configure at the database level. This can save up to 40% on Windows VMs.
In the Azure portal, navigate to your virtual machine scale set, select 'Scaling', and configure autoscale rules. Choose a metric like CPU percentage or queue depth. Set minimum and maximum instance counts. For cost optimization, set a low minimum (e.g., 1) and a moderate maximum. Use scale-in rules to remove instances when demand drops. The default cooldown is 5 minutes. Enable predictive autoscale (preview) for proactive scaling based on historical patterns. Test the rules with a load test.
You've just covered Cost-Aware Architecture Design Patterns — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?