This chapter covers the monitoring strategy for Azure solutions, focusing on Azure Monitor, Log Analytics, and Insights. For the AZ-305 exam, understanding how to design a comprehensive monitoring solution is critical, as it appears in approximately 15-20% of exam questions related to design for operations. You will learn the core components, how they interact, and best practices for implementing observability in Azure environments.
Jump to a section
Imagine a large office building with multiple floors, each floor having its own set of sensors: temperature, humidity, motion, and smoke detectors. All these sensors send their readings to a central monitoring station in the basement. The central station has a big board that shows real-time data from every sensor. It also logs all readings for historical analysis. If a sensor on the 3rd floor detects smoke, the central station triggers an alarm, notifies the fire department, and logs the event. The building manager can query the log to see past temperature trends or identify which floor had the most motion events. In this analogy, the sensors are Azure resources (VMs, databases, web apps), the central station is Azure Monitor, the big board is the Azure portal dashboard, the log is Log Analytics, and the alarm system is Azure Monitor Alerts. The building manager uses the same interface to monitor everything, set up alerts, and analyze historical data, just as an Azure architect uses Azure Monitor to gain observability across all resources.
What is Azure Monitor and Why It Exists
Azure Monitor is the central platform for collecting, analyzing, and acting on telemetry from Azure resources and on-premises environments. It provides a unified view of application performance, infrastructure health, and resource utilization. The primary goal is to maximize availability and performance by enabling proactive detection of issues, root cause analysis, and automated responses. Azure Monitor replaces the need for multiple disjointed monitoring tools by consolidating metrics, logs, and alerts into a single service.
How Azure Monitor Works Internally
Azure Monitor ingests data from two primary sources: metrics and logs. Metrics are numerical values collected at regular intervals (e.g., CPU percentage, disk IOPS) and are stored in a time-series database optimized for real-time analysis. Logs contain detailed text records (e.g., event logs, application traces) stored in Log Analytics workspaces. The data pipeline is as follows:
Data Collection: Agents (Azure Monitor Agent, Dependency Agent, or legacy agents) collect data from VMs, containers, and applications. Azure services (like Azure SQL Database, Azure Functions) emit platform metrics and resource logs directly to Azure Monitor. Diagnostic settings define which logs and metrics are sent to which destinations (Log Analytics workspace, Event Hubs, Storage).
Data Storage and Processing: Metrics are stored in the Azure Monitor Metrics database, which supports near real-time querying (latency < 1 minute). Logs are stored in Log Analytics workspaces, which use a Kusto Query Language (KQL) engine for complex queries. Data retention for metrics is 93 days by default (adjustable to up to 730 days for paid tiers). Logs retention is configurable from 30 days to 2 years (or longer with archive).
Analysis and Visualization: Metrics can be visualized on the Azure portal metrics explorer, pinned to dashboards, or exported. Logs are queried using KQL in the Log Analytics query editor. Workbooks provide interactive reports combining metrics and logs.
Alerting: Azure Monitor Alerts evaluate metric or log queries at a specified frequency (e.g., every 1 minute). When conditions are met (e.g., CPU > 90% for 5 minutes), an alert fires, triggering actions like sending email, SMS, or running an Azure Automation runbook.
Key Components, Defaults, and Timers
Azure Monitor Agent (AMA): The current recommended agent for collecting data from VMs. It supports Windows and Linux, uses data collection rules (DCRs) to define what to collect and where to send it. Default collection interval for performance counters is 60 seconds.
Log Analytics Workspace: A container for log data. Each workspace has a unique workspace ID and key. Default retention is 30 days for free tier, 31 days for paid tiers (adjustable). Data ingestion latency is typically under 5 minutes.
Metric Alerts: Can evaluate metrics every 1 minute (statistic: average, min, max, count, sum). Evaluation frequency can be 1, 5, 15, or 30 minutes. The default window is 5 minutes.
Log Alerts: Based on log queries. The query runs every 5 minutes by default (configurable from 1 minute to 24 hours). The time window for the query can be up to 1440 minutes.
Action Groups: Define the notification and action recipients. Supported actions: email (limit: 100 emails per hour per action group), SMS (limit: 1 SMS per 5 minutes per phone number), voice call, ITSM connector, Automation runbook, Azure Function, webhook, etc.
Configuration and Verification Commands
To deploy the Azure Monitor Agent on a VM:
# Register Microsoft.Monitor resource provider
az provider register --namespace Microsoft.Monitor
# Create data collection rule (DCR)
az monitor data-collection rule create --name myDCR --resource-group myRG \
--location eastus --rule-file dcr.json
# Associate DCR with VM
az monitor data-collection rule association create --name myAssociation \
--resource /subscriptions/.../virtualMachines/myVM \
--data-collection-rule-id /subscriptions/.../dataCollectionRules/myDCRTo query logs in a Log Analytics workspace:
// Example KQL query to find top CPU consumers
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize avg(CounterValue) by Computer
| top 10 by avg_CounterValue descHow It Interacts with Related Technologies
Azure Monitor and Azure Service Health: Service Health issues (e.g., regional outages) appear in the Azure Monitor portal and can trigger alerts.
Azure Monitor and Azure Policy: Azure Policy can enforce diagnostic settings on resources, ensuring they send logs to a Log Analytics workspace.
Azure Monitor and Azure Automation: Alerts can trigger Automation runbooks for automated remediation, e.g., restarting a VM.
Azure Monitor and Azure Security Center: Security Center uses Log Analytics to store security alerts and recommendations.
Azure Monitor and Application Insights: Application Insights (part of Azure Monitor) monitors web applications, collecting telemetry like page views, requests, and exceptions. It sends data to a Log Analytics workspace (workspace-based Application Insights).
Insights: Application Insights, VM Insights, Container Insights
Application Insights: Monitors live web applications. Features: distributed tracing, dependency mapping, smart detection (proactive anomaly detection). It uses the Application Insights SDK or auto-instrumentation agents. Data includes requests, dependencies, exceptions, traces, and custom events.
VM Insights: Monitors VMs at scale. It uses the Azure Monitor Agent and Dependency Agent to collect performance data and process dependencies (mapped via Service Map). It provides pre-built workbooks and health views.
Container Insights: Monitors container workloads in AKS, Azure Container Instances, or AKS Engine. Collects metrics (CPU, memory) and logs (stdout/stderr) from containers and sends them to Log Analytics. Pre-built views show cluster health, node status, and pod performance.
Retention, Pricing, and Data Limits
Metrics: Retained for 93 days (free). For longer retention, stream to Log Analytics (cost applies) or archive to storage.
Logs: Free tier: 5 GB/month data ingestion, 7 days retention. Paid tier: per GB ingested (approx $2.30/GB in East US), retention 31 days (adjustable up to 730 days). Additional cost for long-term retention (archive to storage).
Data limits: Log Analytics workspace has a daily cap (default 100 GB, adjustable). When cap is reached, data ingestion stops for the day. Alerts can be set when cap is approached.
Metric alerts: Each alert rule costs based on the number of signals evaluated (e.g., metric alerts: $0.10 per rule per month for up to 10 signals).
Best Practices for Exam
Always use Azure Monitor Agent (AMA) over legacy agents (MMA/OMS).
Use diagnostic settings to route resource logs to Log Analytics for analysis.
For critical resources, configure metric alerts with action groups.
Use workbooks for custom dashboards and reports.
Enable Application Insights for all web applications to monitor performance.
Use VM Insights and Container Insights for proactive monitoring of compute workloads.
Set up log queries for specific error patterns and create log alerts.
Remember that Log Analytics workspace is regional; plan for data sovereignty.
1. Design Log Analytics Workspace Strategy
Determine the number and regions of Log Analytics workspaces. For most enterprises, a single workspace is sufficient unless there are regulatory or data sovereignty requirements. Each workspace is regional; data is stored in that region. Consider using a central workspace for all monitoring data, but be aware that cross-region data transfer incurs costs. For the exam, remember that a single workspace can consolidate logs from multiple regions, but network latency may affect real-time queries. Also, workspace access control can be granular with Azure RBAC.
2. Configure Data Collection with Diagnostic Settings
For each Azure resource (e.g., VMs, databases, web apps), enable diagnostic settings to send platform metrics and resource logs to a Log Analytics workspace. Diagnostic settings can also send data to Event Hubs for streaming or to Storage for archiving. Use Azure Policy to enforce diagnostic settings across subscriptions. In the exam, know that diagnostic settings are per-resource and can be configured via portal, CLI, or PowerShell. Example: for a VM, enable 'Guest OS metrics' and 'Syslog' or 'Windows Event logs'.
3. Install and Configure Azure Monitor Agent
Deploy the Azure Monitor Agent (AMA) on VMs (Azure, on-premises, or other clouds). Use Data Collection Rules (DCRs) to define which data to collect (performance counters, events) and where to send it. AMA replaces the legacy Log Analytics agent and is required for VM Insights. For the exam, remember that AMA supports both Windows and Linux, and DCRs can be applied at scale via policy. The agent uses port 443 to communicate with Azure Monitor.
4. Create Metric and Log Alerts
Define alert rules based on metrics (e.g., CPU > 90%) or log queries (e.g., error count > 10 in 5 minutes). Metric alerts have lower latency (1 minute) and are ideal for threshold-based monitoring. Log alerts allow complex logic but have higher latency (5 minutes). Each alert rule must be associated with an action group. For the exam, know that metric alerts can monitor multiple resources (e.g., all VMs in a scale set) and support dynamic thresholds (machine learning-based).
5. Enable Application Insights for Web Apps
For web applications, enable Application Insights either via SDK instrumentation or auto-instrumentation (e.g., using the Application Insights agent for .NET). This collects request rates, response times, failure rates, dependency tracking, and user behavior. Data is stored in a Log Analytics workspace (workspace-based Application Insights). Configure availability tests (URL ping tests) to monitor uptime from multiple global locations. For the exam, remember that Application Insights is part of Azure Monitor and supports distributed tracing.
Scenario 1: Large Enterprise with Hybrid Infrastructure
A global company runs thousands of VMs across Azure and on-premises. They deploy Azure Monitor Agent on all VMs via Azure Arc and use a single Log Analytics workspace in the East US region. Data collection rules are centrally managed and applied via Azure Policy. They create metric alerts for CPU, memory, and disk usage with action groups that send email to the operations team and trigger an Automation runbook to auto-scale or restart VMs. For log alerts, they query Windows Event logs for security events (e.g., multiple failed logins) and send to SIEM via Event Hubs. The challenge: high volume of logs from on-premises VMs causes data ingestion costs to spike. They implement a daily cap and set up log analytics queries to filter out noise. The solution works well, but they must ensure the Log Analytics workspace region is close to most resources to reduce latency.
Scenario 2: E-commerce Platform with Application Insights
An e-commerce company uses Azure App Service and Azure SQL Database. They enable Application Insights on their web app using the .NET SDK. They configure availability tests from three global locations (US, Europe, Asia) to monitor the home page and checkout flow. Metric alerts monitor server response time (> 5 seconds) and failure rate (> 2%). Log alerts query for exceptions related to payment processing. They use workbooks to create a dashboard showing real-time user sessions, page views, and server metrics. The operations team receives SMS alerts for critical issues. A common pitfall is missing dependency tracking for external APIs; they must add manual dependency tracking calls. Performance considerations: Application Insights data ingestion costs can be high for high-traffic apps; they set sampling (e.g., 10%) to reduce volume.
Scenario 3: Kubernetes Cluster Monitoring with Container Insights
An enterprise runs AKS clusters across multiple regions. They enable Container Insights on each cluster, which deploys a containerized Log Analytics agent. The agent collects pod logs, node metrics, and Kubernetes events. Pre-built workbooks show cluster capacity, node health, and pod performance. They create metric alerts for node CPU pressure and log alerts for CrashLoopBackOff events. They also enable Azure Policy to enforce that all new AKS clusters have Container Insights enabled. A common misconfiguration is not setting up log retention properly, leading to high costs. They set log retention to 90 days and archive older logs to Azure Storage. The solution provides end-to-end observability for containerized workloads.
What AZ-305 Tests on This Topic
The AZ-305 exam (Design for Operations) includes objectives related to designing a monitoring strategy. Specifically, objective 1.3: 'Design a monitoring strategy for Azure resources'. The exam focuses on:
Choosing between Azure Monitor and third-party tools.
Designing Log Analytics workspace architecture (single vs. multiple workspaces).
Configuring diagnostic settings and data collection.
Selecting appropriate alert types (metric vs. log) and action groups.
Enabling Application Insights, VM Insights, and Container Insights.
Understanding data retention, pricing, and data limits.
Common Wrong Answers and Why Candidates Choose Them
Using legacy agents (MMA/OMS) instead of Azure Monitor Agent (AMA): Candidates often choose MMA because it's familiar. The exam expects AMA as the current recommended agent. Wrong because AMA is more secure, supports DCRs, and is required for VM Insights.
Creating multiple Log Analytics workspaces for each region without justification: Candidates think data sovereignty requires per-region workspaces. However, a single workspace can collect data from multiple regions, though cross-region latency and costs exist. The exam expects justification for multiple workspaces (e.g., regulatory compliance).
Using log alerts for all scenarios: Metric alerts are lower latency and cheaper. Candidates might overuse log alerts. The exam tests knowing when to use metric vs. log alerts (e.g., metric for simple thresholds, log for complex queries).
Not enabling diagnostic settings for all resources: Candidates may think Azure Monitor automatically collects all logs. It does not; diagnostic settings must be explicitly configured. The exam expects understanding that diagnostic settings are per-resource.
Specific Numbers and Terms on the Exam
Metric retention: 93 days (free).
Log Analytics default retention: 31 days.
Daily cap default: 100 GB.
Metric alert evaluation frequency: minimum 1 minute.
Log alert evaluation frequency: minimum 5 minutes (default).
Action group email limit: 100 emails per hour.
Data ingestion cost: ~$2.30/GB (varies by region).
Terms: 'Diagnostic settings', 'Data collection rules (DCR)', 'Log Analytics workspace', 'Kusto Query Language (KQL)', 'Action group', 'Smart detection', 'Availability tests'.
Edge Cases and Exceptions
Azure Monitor Agent on Arc-enabled servers: AMA can be deployed on servers outside Azure via Azure Arc. This is a common exam scenario.
Resource logs for Azure PaaS services: Some services (e.g., Azure App Service) have built-in monitoring; others require explicit diagnostic settings.
Cross-workspace queries: Using KQL to query multiple workspaces with the workspace() function.
Log Analytics workspace pricing tier: Per GB (pay-as-you-go) vs. Capacity Reservations (commitment tiers for savings).
Data export to Event Hubs: For real-time streaming to SIEM or other systems.
How to Eliminate Wrong Answers
If a question asks about 'recommended agent', eliminate any answer mentioning Log Analytics agent (MMA) and choose Azure Monitor Agent.
If a question involves 'lowest latency alerting', choose metric alerts over log alerts.
If a question asks about 'centralized logging', choose a single Log Analytics workspace unless there's a specific compliance reason.
If a question involves 'automated remediation', look for an action group that triggers an Automation runbook or Azure Function.
Azure Monitor is the central platform for metrics, logs, and alerts; it unifies monitoring across Azure and on-premises.
Use Azure Monitor Agent (AMA) for all new deployments; legacy MMA is deprecated.
Configure diagnostic settings on every Azure resource to send logs to a Log Analytics workspace.
Metric alerts have 1-minute evaluation frequency; log alerts have 5-minute default frequency.
Log Analytics workspace retention defaults to 31 days; can be extended up to 730 days with additional cost.
Application Insights is part of Azure Monitor; use workspace-based Application Insights for unified log storage.
Enable VM Insights and Container Insights for proactive monitoring of compute workloads.
Action groups define notification channels (email, SMS, webhook) and can trigger automation runbooks.
Cross-workspace queries use the workspace() function in KQL.
Data collection rules (DCRs) are used by AMA to define what data to collect and where to send it.
These come up on the exam all the time. Here's how to tell them apart.
Azure Monitor Agent (AMA)
Recommended by Microsoft; will be the only agent in future.
Uses Data Collection Rules (DCRs) for flexible configuration.
Supports both Windows and Linux.
Required for VM Insights and Azure Arc.
More secure (does not require local admin on Linux).
Legacy Log Analytics Agent (MMA/OMS)
Legacy; no new features being developed.
Configuration via workspace settings (less flexible).
Supports Windows and Linux, but with separate agents.
Not compatible with VM Insights or Azure Arc.
Requires local admin on Linux for installation.
Mistake
Azure Monitor automatically collects all logs from all Azure resources.
Correct
Azure Monitor collects platform metrics automatically, but resource logs (e.g., application logs, security logs) require you to configure diagnostic settings per resource. Without diagnostic settings, those logs are not collected.
Mistake
You need one Log Analytics workspace per region for data residency.
Correct
A single Log Analytics workspace can collect data from multiple regions. Data residency is determined by the workspace's location, not the source resource's location. However, if regulatory compliance requires data to stay in a specific region, you may need multiple workspaces.
Mistake
Metric alerts and log alerts have the same latency.
Correct
Metric alerts can evaluate every 1 minute (low latency), while log alerts evaluate every 5 minutes by default (higher latency). Metric alerts are better for time-sensitive thresholds.
Mistake
The Log Analytics agent (MMA) is the recommended agent for Azure VMs.
Correct
The Azure Monitor Agent (AMA) is the current recommended agent. MMA is legacy and will be deprecated. AMA supports Data Collection Rules, is more secure, and is required for VM Insights.
Mistake
Application Insights is a separate service from Azure Monitor.
Correct
Application Insights is part of Azure Monitor. It uses a Log Analytics workspace for data storage (workspace-based Application Insights). It is not a standalone service.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Metrics are numerical values collected at regular intervals (e.g., CPU percentage) and are stored in a time-series database optimized for real-time analysis. They have low latency (under 1 minute) and are ideal for threshold-based alerts. Logs are detailed text records (e.g., event logs, application traces) stored in a Log Analytics workspace. They support complex queries with KQL and are better for root cause analysis. Both can be used for alerts, but metric alerts are faster.
For most organizations, a single Log Analytics workspace is sufficient. Use multiple workspaces only if you have regulatory requirements for data residency or need to isolate data for different business units. A single workspace simplifies management and cross-resource queries. However, consider that all data in one workspace is subject to the same retention and access policies.
The default retention for paid tiers is 31 days. You can adjust retention up to 730 days (2 years) at an additional cost per GB per month. The free tier has 7 days retention. For long-term archiving, you can export logs to Azure Storage.
Deploy Azure Arc to connect on-premises servers to Azure, then install the Azure Monitor Agent (AMA) on those servers. Create Data Collection Rules (DCRs) to define what logs to collect and send them to a Log Analytics workspace. Alternatively, you can use the legacy Log Analytics agent, but AMA is recommended.
Azure Monitor has no upfront cost; you pay for data ingestion and retention. Metrics are free for 93 days. Logs cost approximately $2.30 per GB ingested (varies by region). Alerts have a small monthly cost per rule (e.g., $0.10 per metric alert rule). Application Insights has additional costs based on data volume. Use sampling to reduce costs.
Yes, you can monitor on-premises and other cloud resources using Azure Arc and the Azure Monitor Agent. For Linux and Windows servers, install AMA and configure DCRs. For applications, use Application Insights SDK. This provides a unified monitoring solution across hybrid environments.
Metric alerts evaluate numerical metrics (e.g., CPU > 90%) and can fire every 1 minute. They are low-latency and ideal for simple thresholds. Log alerts evaluate KQL queries over log data and have a minimum evaluation frequency of 5 minutes. They support complex logic (e.g., count of errors in last 5 minutes) but have higher latency. Use metric alerts for time-sensitive conditions and log alerts for advanced analysis.
You've just covered Monitoring Strategy: Azure Monitor, Log Analytics, Insights — now see how well it sticks with free AZ-305 practice questions. Full explanations included, no account needed.
Done with this chapter?