This chapter covers designing monitoring solutions in Microsoft Azure, a critical area for the AZ-305 exam. Monitoring is essential for maintaining performance, availability, and security of cloud workloads. Approximately 10-15% of AZ-305 exam questions touch on monitoring, including Azure Monitor, Log Analytics, alerts, and health monitoring. You will learn how to architect a comprehensive monitoring strategy that meets operational and business requirements.
Jump to a section
Imagine a large corporate building with thousands of employees, visitors, and systems. The building has a central security control room with cameras, motion sensors, door alarms, and environmental monitors. Each camera captures raw video (logs) and sends it to a recording server (Log Analytics workspace). Motion sensors detect specific events (alerts) like doors opening after hours. The security team can query the recording server to investigate past incidents (Log Analytics queries). They also set up dashboards showing real-time occupancy, temperature, and security status (Azure dashboards and workbooks). When a sensor detects smoke, it triggers an automatic call to the fire department (Action Group). The building has multiple floors (Azure resources) and the security system must scale to handle thousands of sensors. If a sensor fails, the system logs it and alerts maintenance (health monitoring). This analogy mirrors Azure Monitor: resources emit logs and metrics, which are collected and stored in Log Analytics workspaces, with alerts and actions configured to respond to conditions. Just as the security system provides a single pane of glass for building operations, Azure Monitor provides unified monitoring for Azure resources, enabling detection, diagnosis, and remediation.
What is Azure Monitor and Why It Exists
Azure Monitor is the central platform for collecting, analyzing, and acting on telemetry from Azure resources and on-premises environments. It replaces the need for disparate monitoring tools by providing a unified solution. The primary goals are: - Detect and diagnose issues across applications and infrastructure. - Understand performance and resource utilization. - Proactively respond to critical conditions via alerts and automated actions. - Optimize resource usage and costs. - Meet compliance and auditing requirements.
Core Components and Architecture
Azure Monitor consists of several key components: - Data Sources: Metrics (numerical values at regular intervals) and Logs (events with timestamps). Metrics are stored in a time-series database and support near real-time alerting. Logs include activity logs, resource logs, and custom logs. - Data Collection: Agents (Azure Monitor agent, Log Analytics agent, Diagnostic extension) and APIs collect data from Azure resources, VMs, containers, and on-premises machines. The Azure Monitor agent is the preferred agent, replacing the older Log Analytics and Diagnostics agents. - Data Storage: Logs are stored in Log Analytics workspaces (each workspace has a unique workspace ID and key). Metrics are stored in the Azure Monitor metrics database. - Analysis and Visualization: Log Analytics queries (using Kusto Query Language, KQL) allow deep analysis. Dashboards, workbooks, and Power BI provide visualization. - Alerts and Actions: Alert rules evaluate metrics or logs at a defined frequency. When triggered, they fire an action group (email, SMS, webhook, ITSM, automation runbook, etc.). - Insights: Specialized monitoring experiences for specific services (Application Insights, Container Insights, VM Insights, Network Insights).
How Azure Monitor Works Internally
Data Ingestion: Resources emit metrics and logs. For example, an Azure VM generates metrics like CPU percentage and logs like security events. The Azure Monitor agent collects these and sends them to the Log Analytics workspace via HTTPS on port 443. Data is compressed and batched for efficiency.
Storage and Indexing: Log Analytics workspaces store data in tables with a schema. Each table has predefined columns (e.g., Event, Syslog, Perf). Data is indexed for fast querying. Retention is configurable from 30 days to 730 days (or 2 years) for most data, with longer retention available via data export to Azure Storage.
Query Processing: When a user runs a KQL query, the query is parsed, optimized, and executed across the workspace's data. Results are returned in tabular format. Queries can span multiple workspaces using cross-workspace queries.
Alert Evaluation: Alert rules are evaluated periodically (e.g., every 1 minute for metric alerts, every 5 minutes for log alerts). For metric alerts, the rule checks if the metric value crosses a threshold for a specified number of consecutive periods. For log alerts, the rule runs a log query and checks if the number of results meets a condition (e.g., > 0). When triggered, the rule fires the associated action group.
Key Configuration Parameters and Defaults
Log Analytics Workspace: Pricing tier (Pay-as-you-go or Capacity Reservations), retention (default 30 days, max 730 days), daily cap (default no cap, can be set to limit costs).
Data Collection Rules (DCRs): Define what data to collect and how to transform it. DCRs are associated with resources or agents.
Metric Alerts: Frequency (1 min, 5 min, 15 min, 30 min), aggregation granularity (1 min to 24 hours), threshold (static or dynamic), number of violations to trigger (e.g., 3 out of 5).
Log Alerts: Frequency (5 min to 24 hours), query (KQL), threshold (>0, <0, etc.), number of results (absolute, metric measurement).
Action Groups: Actions include email (limit of 100 emails per hour per email address), SMS (limit of 10 SMS per hour per phone number), webhook (must respond within 10 seconds), ITSM connector, automation runbook, Azure Function, etc.
Configuration and Verification Commands
Using Azure CLI:
# Create Log Analytics workspace
az monitor log-analytics workspace create --resource-group rg-monitoring --workspace-name laworkspace1 --location eastus --sku PerGB2018
# Create metric alert for high CPU on VM
az monitor metrics alert create --name cpuAlert --resource-group rg-monitoring --scopes /subscriptions/.../virtualMachines/vm1 --condition "avg Percentage CPU > 80" --window-size 5m --evaluation-frequency 1m --action /subscriptions/.../actionGroups/ag1
# Query logs
az monitor log-analytics query --workspace laworkspace1 --query "Perf | where CounterName == '% Processor Time' | where Computer == 'vm1' | summarize avg(CounterValue) by bin(TimeGenerated, 5m)"Using PowerShell:
New-AzOperationalInsightsWorkspace -ResourceGroupName rg-monitoring -Name laworkspace1 -Location eastus -Sku PerGB2018
Add-AzMetricAlertRuleV2 -Name cpuAlert -ResourceGroupName rg-monitoring -WindowSize (New-TimeSpan -Minutes 5) -EvaluationFrequency (New-TimeSpan -Minutes 1) -TargetResourceId /subscriptions/.../virtualMachines/vm1 -Condition (New-AzMetricAlertRuleV2Criteria -MetricName "Percentage CPU" -TimeAggregation Average -Operator GreaterThan -Threshold 80)Interaction with Related Technologies
Azure Policy: Can enforce diagnostic settings on resources to ensure logs are sent to Log Analytics.
Azure Backup: Backup reports and alerts can be integrated with Log Analytics.
Azure Security Center / Microsoft Defender for Cloud: Uses Log Analytics to store security alerts and recommendations.
Azure Automation: Runbooks can be triggered by alerts to perform remediation (e.g., restart a VM).
Azure Event Hubs: Can stream logs to Event Hubs for integration with SIEM tools.
Azure Dashboards and Grafana: Visualize metrics and logs.
Best Practices for Designing Monitoring Solutions
Centralize logs: Use a single Log Analytics workspace per region (or per compliance boundary) to reduce complexity.
Use diagnostic settings: Enable diagnostic settings on all resources to send platform logs and metrics to Log Analytics.
Implement health monitoring: Use Azure Monitor's health monitoring for Azure services (Service Health) and resource health.
Define alert severity: Use severity levels (0-4) and ensure critical alerts have immediate actions.
Use dynamic thresholds: For metric alerts, use dynamic thresholds to adapt to normal patterns.
Cost management: Set daily caps on Log Analytics workspaces and use data retention policies to control costs.
Secure monitoring data: Use private links for Log Analytics workspaces to prevent data exfiltration.
Exam-Relevant Details
Log Analytics workspace: The workspace ID and key are used to configure agents. The workspace must be in the same region as the resources for optimal performance.
Diagnostic settings: Can send logs to Log Analytics, Event Hubs, or Storage. For long-term retention, use Azure Storage.
Azure Monitor agent: Supports Windows and Linux, uses Data Collection Rules (DCRs) to define what to collect. It replaces the Log Analytics agent (MMA) and Diagnostics extension.
Metric alerts: Can alert on multiple metrics using a single rule (dynamic thresholds). Log alerts can have a dimension filter.
Action Groups: Can be reused across multiple alert rules. Limits: 10 action groups per subscription for email/SMS/push/voice actions.
Service Health: Provides alerts when Azure services experience issues. Includes planned maintenance and health advisories.
Common Trap Patterns on the Exam
Confusing Log Analytics workspace with Application Insights: Application Insights is for application monitoring, while Log Analytics is for infrastructure and platform logs. They can be integrated (Application Insights can send data to Log Analytics).
Choosing Storage for real-time monitoring: Storage is for long-term retention and auditing, not for real-time analysis. Log Analytics is used for real-time querying.
Assuming all metrics are stored for 93 days: By default, most platform metrics are stored for 93 days, but custom metrics can have different retention. Log Analytics retention is configurable.
Overlooking diagnostic settings: Many resources do not send logs to Log Analytics by default; you must configure diagnostic settings.
Misunderstanding metric vs log alerts: Metric alerts are for near real-time (1 min frequency) and are based on numerical values. Log alerts are slower (5 min frequency) and are based on log queries.
Step-by-Step Configuration of an End-to-End Monitoring Solution
Create a Log Analytics workspace in the target region.
Enable diagnostic settings on all Azure resources to send logs and metrics to the workspace.
Install the Azure Monitor agent on VMs (via Azure Policy or manually) and create Data Collection Rules to collect performance counters, syslog, and custom logs.
Create metric alerts for critical metrics (CPU > 80%, memory > 90%, disk space < 10%).
Create log alerts for security events (failed logins, malware detected) using KQL queries.
Configure action groups with email, SMS, and webhook actions.
Create dashboards and workbooks to visualize key performance indicators.
Set up Service Health alerts for Azure service issues.
Test alerts by triggering conditions (e.g., stress test a VM).
Review and optimize costs by adjusting retention and daily caps.
Create Log Analytics Workspace
The first step is to create a Log Analytics workspace in the Azure portal, CLI, or PowerShell. The workspace is the container for log data. Choose a region close to your resources to minimize latency. The pricing tier defaults to Pay-as-you-go (PerGB2018). Set a retention period (default 30 days) and optionally a daily cap to control costs. The workspace ID and keys are used to configure agents. You can also enable data export to Azure Storage for long-term retention beyond 730 days.
Enable Diagnostic Settings on Resources
For each Azure resource (VMs, SQL databases, web apps, etc.), navigate to the 'Diagnostic settings' blade under 'Monitoring'. Enable sending platform logs and metrics to the Log Analytics workspace. You can also send to Event Hubs for streaming or to Storage for archiving. Diagnostic settings are not enabled by default; this is a common exam pitfall. Use Azure Policy to enforce diagnostic settings across all resources.
Install Azure Monitor Agent on VMs
The Azure Monitor agent (AMA) is the next-generation agent for collecting data from VMs. It can be installed via the Azure portal, using Azure Policy (recommended for large scale), or manually. After installation, create Data Collection Rules (DCRs) that define what data to collect (e.g., performance counters, Windows event logs, syslog). DCRs are assigned to VMs. The agent sends data to the Log Analytics workspace using HTTPS on port 443.
Create Metric Alerts for Key Metrics
Metric alerts evaluate resource metrics at a specified frequency (e.g., every 1 minute). Define a condition (e.g., average CPU > 80% for 5 minutes). Use dynamic thresholds to automatically adjust baselines. Set the severity (0-4) and associate an action group. Metric alerts have a low latency (1-5 minutes) and are ideal for real-time monitoring. Test the alert by generating load on the resource.
Create Log Alerts for Complex Conditions
Log alerts run a KQL query against the Log Analytics workspace at a defined frequency (e.g., every 5 minutes). The query returns results; if the number of results meets a threshold (e.g., > 0), the alert fires. Use dimensions to filter by specific resource or property. Log alerts are slower than metric alerts (5-15 minutes latency) but allow complex logic. For example, alert on multiple failed logins from the same IP within 10 minutes.
Configure Action Groups and Remediation
Action groups define the response when an alert fires. Common actions include sending an email, SMS, voice call, or triggering a webhook. You can also invoke an Azure Automation runbook to automatically remediate (e.g., restart a VM, scale out, or run a script). Action groups can be reused across multiple alerts. Be aware of rate limits: email 100 per hour per address, SMS 10 per hour per number. For critical alerts, use multiple action types.
Create Dashboards and Workbooks
Visualize monitoring data using Azure dashboards and workbooks. Dashboards can pin charts from metrics and log queries. Workbooks provide interactive reports with parameters and drill-downs. Use built-in templates for common scenarios (e.g., VM performance, application health). Share dashboards with stakeholders. Workbooks can be saved as Azure Resource Manager templates for version control.
Set Up Service Health Alerts
Azure Service Health provides personalized alerts when Azure services experience issues. Create alert rules for service issues, planned maintenance, and health advisories. These alerts can be configured to notify you via action groups. Service Health alerts are free. They help you stay informed about Azure-wide incidents that may affect your resources.
Monitor Costs and Optimize
Use Azure Monitor cost analysis to understand data ingestion costs. Set daily caps on Log Analytics workspaces to avoid unexpected bills. Adjust retention periods based on compliance needs (e.g., retain security logs for 1 year, performance logs for 30 days). Use data export to cheap storage for long-term archival. Review unused alert rules and delete them to reduce noise.
Enterprise Scenario 1: Global E-Commerce Platform
A large e-commerce company runs thousands of VMs and hundreds of databases across multiple Azure regions. They need real-time monitoring of application performance and infrastructure health. They deploy Azure Monitor with a Log Analytics workspace per region to reduce latency and comply with data residency requirements. They use VM Insights to collect performance counters and maps of dependencies. They create metric alerts for CPU, memory, and disk I/O with dynamic thresholds to reduce false positives. For the database tier, they use SQL Insights to monitor query performance and deadlocks. They set up log alerts for failed logins and suspicious SQL injections. All critical alerts trigger an action group that sends a webhook to their incident management system (ServiceNow) and pages the on-call engineer. They also use Azure Workbooks to create a single-pane-of-glass dashboard for executives showing uptime, latency, and error rates. A common issue they faced was alert fatigue due to poorly tuned thresholds; they resolved it by using dynamic thresholds and tuning evaluation windows.
Enterprise Scenario 2: Financial Services Compliance
A bank must comply with strict regulatory requirements for logging and monitoring. They use Azure Policy to enforce diagnostic settings on all resources, ensuring all platform logs are sent to a central Log Analytics workspace. They set retention to 365 days for security logs and 90 days for performance logs. They use data export to Azure Storage for long-term archival (7 years). They create custom log queries to detect unauthorized access attempts and send alerts to the security operations center (SOC). They also use Azure Monitor's private link to ensure logs never traverse the public internet. A common misconfiguration they encountered was not enabling diagnostic settings on new resources; they automated remediation using Azure Policy and Azure Automation runbooks. They also use Log Analytics workspace insights to monitor workspace health and data ingestion rates.
Scenario 3: SaaS Provider with Multi-Tenant Architecture
A SaaS provider hosts applications for multiple customers in a single Azure subscription. They need to monitor each tenant's resource usage separately. They use Log Analytics workspaces per tenant (or use a single workspace with resource-level permissions). They configure diagnostic settings on each tenant's resources to send logs to the appropriate workspace. They use role-based access control (RBAC) to restrict access to logs per tenant. They create custom workbooks that show tenant-specific metrics and logs. They use metric alerts with dimensions to alert on high resource usage per tenant. A challenge they face is cost allocation; they use Azure Monitor's usage and estimated costs feature to attribute costs to each tenant. They also use Azure Cost Management to track spending per tenant.
AZ-305 Exam Focus: Designing Monitoring Solutions
This topic maps to objective 1.3 'Design for monitoring and operations'. Expect 2-4 questions on the exam. Key areas tested: - Azure Monitor architecture: Understanding the difference between metrics and logs, and when to use each. - Log Analytics workspaces: Creation, retention, pricing tiers, and daily caps. - Diagnostic settings: How to enable them and where data can be sent (Log Analytics, Event Hubs, Storage). - Agents: Azure Monitor agent vs Log Analytics agent (MMA) vs Diagnostics extension. Know that AMA is the preferred agent. - Alerts: Metric alerts vs log alerts, action groups, and dynamic thresholds. - Health monitoring: Service Health, Resource Health, and Azure Monitor for VMs. - Cost optimization: Daily caps, retention settings, and data export.
Common Wrong Answers and Why Candidates Choose Them
Choosing 'Log Analytics agent' instead of 'Azure Monitor agent': The Log Analytics agent (MMA) is being deprecated. The exam expects you to know that the Azure Monitor agent is the current recommended agent. Candidates often confuse the two.
Selecting 'Storage account' as the destination for real-time monitoring: Storage is for archival, not real-time analysis. Log Analytics or Event Hubs are for real-time. Candidates think storage can be used for analytics, but it's slow.
Forgetting to enable diagnostic settings: Many resources do not send logs by default. The exam will present a scenario where logs are missing, and the correct answer is to enable diagnostic settings. Candidates assume logs are automatically sent.
Confusing metric alerts with log alerts: Metric alerts are faster (1 min) and simpler; log alerts are slower (5 min) but more flexible. The exam may test which type to use for a given latency requirement.
Overlooking dynamic thresholds: Dynamic thresholds automatically adjust baselines and reduce alert fatigue. Candidates often choose static thresholds even when dynamic is better.
Specific Numbers and Terms to Memorize
Metric retention: 93 days (default).
Log Analytics retention: Default 30 days, max 730 days.
Metric alert frequency: Minimum 1 minute.
Log alert frequency: Minimum 5 minutes.
Action group email limit: 100 emails per hour per address.
Action group SMS limit: 10 SMS per hour per number.
Data Collection Rule (DCR): Used by Azure Monitor agent.
Kusto Query Language (KQL): Used for log queries.
Edge Cases and Exceptions
Cross-workspace queries: You can query multiple workspaces in a single log query using the workspace() function.
Azure Monitor Private Link: Use to connect to Log Analytics workspace over a private endpoint.
Data export: Can export data to Event Hubs or Storage for real-time streaming or long-term retention.
Azure Monitor for Containers: Uses Container Insights, which requires a Log Analytics workspace.
Application Insights: For application performance monitoring, can send data to Log Analytics workspace for unified analysis.
How to Eliminate Wrong Answers
Identify the requirement: real-time? Use metric alerts. Complex logic? Use log alerts.
Check if logs are already being collected: If not, the answer likely involves enabling diagnostic settings.
For cost control: Look for daily cap, retention adjustment, or data export to storage.
For security: Use private link, RBAC on workspace, and enable diagnostic settings.
For multi-region: Use separate Log Analytics workspaces per region to reduce latency and comply with data residency.
Azure Monitor is the central platform for monitoring Azure resources, using metrics and logs.
Log Analytics workspaces store log data; default retention is 30 days (max 730).
Diagnostic settings must be enabled on each resource to send platform logs to Log Analytics.
The Azure Monitor agent (AMA) is the preferred agent for VMs, using Data Collection Rules.
Metric alerts evaluate every 1 minute; log alerts evaluate every 5 minutes minimum.
Action groups define the response to alerts (email, SMS, webhook, automation runbook).
Dynamic thresholds reduce alert fatigue by adapting to normal patterns.
Service Health alerts notify you of Azure service issues and planned maintenance.
Use Azure Policy to enforce diagnostic settings across all resources.
Cost management: set daily caps, adjust retention, and export data to storage for long-term archival.
These come up on the exam all the time. Here's how to tell them apart.
Azure Monitor Agent (AMA)
Uses Data Collection Rules for centralized configuration.
Supports Windows and Linux with a single agent.
Recommended by Microsoft, with active development.
Can filter and transform data before ingestion.
Supports private link for secure connectivity.
Log Analytics Agent (MMA)
Configured per VM via settings, not centralized.
Separate agents for Windows (MMA) and Linux (OMS).
Being deprecated; no new features.
No built-in data transformation.
Does not support private link natively.
Mistake
Azure Monitor automatically collects all logs from all resources.
Correct
Azure Monitor does not automatically collect logs. You must explicitly enable diagnostic settings on each resource to send platform logs to Log Analytics. For VMs, you also need to install an agent (Azure Monitor agent) to collect guest OS logs.
Mistake
The Log Analytics agent (MMA) is the preferred agent for collecting logs from VMs.
Correct
The Azure Monitor agent (AMA) is the preferred agent. MMA is being deprecated. AMA uses Data Collection Rules and supports both Windows and Linux.
Mistake
You can store logs in Azure Storage and query them in real time using Log Analytics.
Correct
Logs stored in Azure Storage are not queryable by Log Analytics in real time. You must use Log Analytics workspace for real-time querying. Storage is for long-term archival and auditing.
Mistake
Metric alerts and log alerts have the same latency.
Correct
Metric alerts have a minimum evaluation frequency of 1 minute, while log alerts have a minimum of 5 minutes. Metric alerts are near real-time; log alerts are slower due to the time needed to run the query.
Mistake
The default retention for Log Analytics workspace is 90 days.
Correct
The default retention is 30 days. You can configure it up to 730 days. Metrics retention is 93 days by default.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Azure Monitor is the overarching platform for monitoring Azure resources. Log Analytics is a service within Azure Monitor that stores and queries log data. Think of Azure Monitor as the entire monitoring solution, and Log Analytics as the database and query engine for logs. You use Log Analytics workspaces to collect and analyze log data.
Install the Azure Monitor agent on the on-premises servers. The agent connects to Azure over the internet or via a private link. Configure Data Collection Rules to specify which logs to collect (e.g., Windows event logs, syslog, custom logs). The agent sends data to a Log Analytics workspace. Ensure the server has outbound HTTPS connectivity to the workspace endpoint.
Metric alerts are based on numerical metric values (e.g., CPU percentage) and evaluate at a minimum frequency of 1 minute. They are near real-time. Log alerts are based on log queries (KQL) and evaluate at a minimum frequency of 5 minutes. Log alerts are slower but allow complex logic and multiple conditions. Use metric alerts for simple threshold conditions and log alerts for advanced scenarios.
Set a daily cap on your Log Analytics workspace to limit data ingestion. Adjust retention periods: keep performance logs for 30 days, security logs for 90 days, and archive older data to Azure Storage. Use data export to send data to storage for long-term retention instead of keeping it in the workspace. Also, review and disable unused alert rules and agents.
The Azure Monitor agent (AMA) is the next-generation agent that replaces the Log Analytics agent (MMA). AMA uses Data Collection Rules (DCRs) for centralized configuration, supports both Windows and Linux with a single agent, and can filter and transform data before ingestion. MMA is being deprecated, so you should use AMA for new deployments.
Create a Log Analytics workspace in each region where you have resources. This reduces data transfer latency and helps comply with data residency requirements. Use cross-workspace queries (workspace() function) to analyze data across regions. For centralized dashboards, create workbooks that query multiple workspaces.
Diagnostic settings control which logs and metrics are collected from an Azure resource and where they are sent. Without diagnostic settings, most resources do not send platform logs to Log Analytics. You must enable diagnostic settings on each resource (or via Azure Policy) to collect data for monitoring and alerting.
You've just covered Designing Monitoring Solutions — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?