AZ-104Chapter 148 of 168Objective 5.1

KQL Queries for Azure Monitoring

This chapter covers Kusto Query Language (KQL) for Azure Monitor, a critical skill for the AZ-104 exam. KQL is used to query log data from Azure Monitor Logs, Application Insights, and other Azure data sources. Approximately 10-15% of exam questions touch on monitoring, and KQL proficiency is essential for analyzing logs, creating alerts, and building dashboards. You will learn the core syntax, operators, and practical scenarios to pass the exam and perform real-world log analysis.

25 min read
Intermediate
Updated May 31, 2026

KQL as a Library Catalog System

Imagine Azure Monitor logs as a massive library where every book is a log entry. Each book has metadata: title (event type), author (source), publication date (timestamp), genre (category), and shelf location (resource ID). Without a catalog, finding a specific book would require searching every shelf manually — impractical at scale. KQL is the librarian's query language that uses the catalog index (the table schema) to locate books instantly. When you write a query, the librarian first checks the catalog for the table you specify (e.g., 'SecurityEvent'), then applies filters (where clauses) to narrow down by metadata like date range or event ID. She then projects only the relevant fields (project operator) to reduce the information load. If you need to count books by genre, she uses the summarize operator with a count() aggregation, grouping books by genre and tallying each group. The 'let' statement is like pre-defining a search shortcut — you can create a variable that holds a sub-query, such as 'let criticalEvents = SecurityEvent | where EventID == 4625;'. The librarian can then reuse this definition multiple times without re-executing the search. The 'render' operator is equivalent to asking the librarian to create a visual chart of the results, like a bar chart of failed logins per hour. The entire process is efficient because the catalog (index) is optimized for these operations, and the librarian (KQL engine) can process multiple books in parallel, returning results in seconds even from billions of entries.

How It Actually Works

What is KQL and Why Does It Exist?

Kusto Query Language (KQL) is a read-only query language designed for querying large datasets in Azure Data Explorer (ADX) and Azure Monitor Logs. It is optimized for high-performance analytics on structured and semi-structured data. Unlike SQL, which is row-oriented, KQL is columnar and uses a pipeline-based syntax where data flows through a series of operators separated by pipes (|). This design allows for efficient filtering, aggregation, and visualization of log data at scale.

How KQL Works Internally

KQL queries are processed by the Kusto engine, which is distributed and can scale horizontally. The engine parses the query into a query tree, optimizes it (e.g., predicate pushdown, column pruning), and executes it across multiple nodes in parallel. Each operator consumes a tabular stream (rows and columns) and produces a new tabular stream. The order of operators matters: filters (where) should be placed early to reduce data volume early in the pipeline.

Key Components and Syntax

- Tables: Data sources like 'Perf', 'SecurityEvent', 'AzureActivity'. You can list tables with .show tables. - Operators: - where: Filters rows based on a predicate. Example: where TimeGenerated > ago(1h). - project: Selects columns. Example: project Computer, CounterName, CounterValue. - extend: Creates new computed columns. Example: extend Total = CounterValue * 100. - summarize: Aggregates rows. Example: summarize avg(CounterValue) by Computer. - join: Merges rows from two tables. Example: Perf | join kind=inner (Alert) on Computer. - let: Defines variables or functions. Example: let threshold = 90;. - Functions: Built-in functions like ago(), now(), bin(), count(), avg(), percentile(). - Time filters: Use where TimeGenerated > ago(7d) for last 7 days. The default time range is 24 hours in Azure Monitor.

Configuration and Verification

In Azure Monitor, you write KQL queries in the Log Analytics workspace query editor. To test a query, run it and inspect the results. You can save queries as functions or create alert rules based on query results. Use render timechart to visualize time-series data. Common commands to verify table structure: Perf | getschema returns column names and types.

Interaction with Related Technologies

KQL queries can be used in: - Azure Monitor Alerts: Create alert rules with log queries that run at a specified interval (e.g., every 5 minutes). - Azure Dashboards: Pin query results as tiles. - Azure Workbooks: Create interactive reports combining multiple queries. - Azure Sentinel: For security analytics. - Azure Resource Graph: Use KQL to query Azure resources (ARG uses a subset of KQL).

Detailed Operator Guide

where: Filters rows. Use on indexed columns (e.g., TimeGenerated, Computer) for performance. Avoid string operations on large datasets.

Example: SecurityEvent | where EventID == 4625 | where TimeGenerated > ago(1h).

project: Reduces columns. Improves performance by limiting data transfer.

Example: SecurityEvent | project TimeGenerated, Account, IpAddress.

extend: Adds calculated columns. Does not remove existing columns.

Example: Perf | extend UtilPercent = CounterValue / 100 * 100.

summarize: Groups rows by one or more columns and applies aggregation functions. Use bin() for time bucketing.

Example: Perf | where CounterName == "% Processor Time" | summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h).

join: Merges tables. Types: inner, leftouter, fullouter, leftanti, rightanti. Use kind=.

Example: Heartbeat | join kind=inner (Alert) on Computer.

let: Defines reusable parts. Can be scalar or tabular.

Example: let criticalEvents = SecurityEvent | where EventID == 4625; criticalEvents | count.

sort by: Orders results. Use desc or asc.

take: Returns a specified number of rows (like SQL LIMIT).

top: Returns top rows based on a column.

distinct: Returns unique rows.

count: Returns number of rows.

render: Renders a chart. Types: timechart, barchart, piechart, table.

Performance Best Practices

Filter early with where.

Use project to reduce columns.

Avoid using extend on large datasets before filtering.

Use summarize with bin() for time-series.

Limit results with take or top during development.

Use let to avoid repeating subqueries.

Leverage has and contains carefully; has is faster as it uses tokenization.

Use ago() instead of absolute datetime strings.

Common Traps on the Exam

Case sensitivity: KQL is case-sensitive for column names and values. where Computer == "server01" is different from where computer == "server01".

String operators: == for exact match, contains for substring, has for token match. Candidates often confuse contains and has.

Time literals: ago(1d) vs datetime(2024-01-01). Always use ago() for relative times.

Null handling: Use isnull() or isnotnull() to check for nulls. == null does not work.

Join types: Leftouter returns all rows from left table; inner returns only matches. Mixing these yields different results.

Render operator: Must be the last operator. Placing it before a where clause will fail.

Example Query Walkthrough

Goal: Find servers with >90% CPU in the last hour.

Perf
| where TimeGenerated > ago(1h)
| where CounterName == "% Processor Time"
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
| where avg_CounterValue > 90
| render timechart

Step-by-step: 1. Start with Perf table. 2. Filter to last hour. 3. Filter to CPU counter. 4. Group by Computer and 5-minute bins, average the values. 5. Filter to averages >90. 6. Render a timechart.

This query pinpoints sustained high CPU usage.

Walk-Through

1

Identify the Data Source

First, determine which table contains the data you need. Use `.show tables` to list all tables in the workspace. For performance monitoring, use 'Perf'; for security events, use 'SecurityEvent'; for Azure activity, use 'AzureActivity'. If unsure, you can explore a table with `TableName | take 10` to see sample data. This step is critical because querying the wrong table yields no results. The exam often tests your ability to choose the correct table for a given scenario.

2

Filter by Time Range

Apply a time filter using `where TimeGenerated > ago(timespan)`. Common timespans: `ago(1h)`, `ago(24h)`, `ago(7d)`. This reduces data volume early, improving performance. Always filter by time; omitting this can cause queries to time out or return too many rows. The default time range in Azure Monitor is 24 hours, but explicit filtering is best practice. Use `between(datetime1..datetime2)` for custom ranges.

3

Apply Row Filters

Use the `where` operator to filter rows based on specific criteria, such as `EventID == 4625` for failed logins or `CounterName == "% Processor Time"` for CPU metrics. Combine multiple conditions with `and` and `or`. For string matching, use `has` (faster, token-based) or `contains` (substring). Avoid expensive string operations on large datasets. This step narrows the data to relevant events.

4

Select and Project Columns

Use `project` to keep only the columns you need, e.g., `project TimeGenerated, Computer, CounterValue`. This reduces data transfer and speeds up subsequent operations. You can also rename columns with `project-rename` or reorder them. If you need to add computed columns, use `extend` before or after projecting. For example, `extend Total = CounterValue * 100 | project TimeGenerated, Total`.

5

Aggregate and Summarize

Use `summarize` to group data and apply aggregation functions like `count()`, `avg()`, `sum()`, `min()`, `max()`, or `percentile()`. Group by one or more columns, and use `bin()` for time bucketing. Example: `summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)`. This step transforms raw data into meaningful metrics. For time-series charts, always use `bin()` on TimeGenerated.

6

Order and Limit Results

Use `sort by` to order results (e.g., `sort by avg_CounterValue desc`). Use `take N` to return only the top N rows, or `top N by Column` to get the highest values. This is useful for generating top-N lists. For example, find top 5 servers by CPU: `summarize avg(CounterValue) by Computer | top 5 by avg_CounterValue desc`. This step ensures the output is manageable and focused.

7

Visualize with Render

Add `render timechart` (or barchart, piechart, table) as the last operator to visualize the results. This is especially important for dashboards and alerts. The render operator must be the final step; placing it before other operators causes an error. For time-series data, use `render timechart` with `series` parameter to specify which column to split the chart by.

What This Looks Like on the Job

Scenario 1: Troubleshooting a Performance Incident

A large e-commerce company experiences a 10-minute spike in page load times. The cloud engineer uses KQL to query the Perf table for all servers over the last hour, filtering on CounterName == "% Processor Time" and CounterName == "Available MBytes Memory". They summarize by server and 1-minute bins, then render a timechart to identify which server spiked. They discover one web server hit 100% CPU while memory remained normal, indicating a CPU-bound process. They then query the Event table for that server around the same time to find a process crash. This analysis takes under 2 minutes with KQL, whereas manual log review would take hours.

Scenario 2: Security Incident Response

A financial institution detects a brute-force attack. The security analyst uses KQL to query SecurityEvent for EventID 4625 (failed logins) in the last 24 hours. They summarize by source IP and count, then sort descending to find the top attacking IPs. They then join with the AzureActivity table to see if any resources were modified from those IPs. They create an alert rule that triggers when more than 10 failed logins occur from a single IP in 5 minutes. This query runs every 5 minutes and has caught multiple attacks.

Scenario 3: Capacity Planning

A SaaS provider needs to forecast storage growth. They query the AzureMetrics table for Blob Storage metrics, summarizing by storage account and bin(TimeGenerated, 1d) for the last 90 days. They use series_decompose() to detect trends and seasonality. The results help them predict when to scale storage and budget for costs. Misconfiguration: If they omit the time filter, the query times out due to the large dataset. They learned to always filter by time first.

Common Production Pitfalls

Missing time filter: Causes queries to scan the entire table, leading to timeouts or throttling.

Inefficient joins: Joining large tables without filtering first can overwhelm the engine. Always filter both tables before joining.

Overusing `contains`: Slows down queries on large datasets. Use has for token matching or == for exact matches.

Not using `project`: Returning all columns increases memory and network usage.

Ignoring query limits: Azure Monitor has a 30-minute timeout and 64MB result size limit. Use take to limit results.

Scale Considerations

In workspaces with over 1TB of data per day, queries must be optimized. Use sharding by TimeGenerated and avoid cross-workspace queries unless necessary. Use let to define subqueries that filter early. For very large datasets, use Azure Data Explorer directly for ad-hoc analysis.

How AZ-104 Actually Tests This

What AZ-104 Tests on KQL

Domain 5.1 (Monitor and troubleshoot by using Azure Monitor) includes writing and interpreting KQL queries. The exam expects you to:

Identify the correct table for a given scenario (e.g., Perf for performance, SecurityEvent for security, AzureActivity for resource changes).

Write queries that filter, aggregate, and visualize log data.

Understand the purpose of common operators: where, project, extend, summarize, join, let, render.

Recognize the correct syntax for time filters (ago, datetime, bin).

Differentiate between string operators (==, contains, has).

Know how to create alert rules from queries.

Common Wrong Answers and Why

1.

Using `contains` when `has` is more appropriate: Candidates choose contains because it seems broader, but has is faster and more precise for token matching. The exam tests performance awareness.

2.

Omitting the time filter: Many candidates write queries without where TimeGenerated > ago(...), expecting the default to be sufficient. The default is 24 hours, but explicit filtering is best practice and often required for correctness.

3.

Putting `render` before other operators: Candidates place render in the middle of the pipeline, which causes a syntax error. render must be last.

4.

Confusing `summarize` with `project`: summarize groups and aggregates; project selects columns. Mixing these leads to incorrect results.

5.

Using `extend` without `project`: extend adds columns but does not remove existing ones, leading to bloated results. Use project to trim.

Specific Numbers and Terms

Time filter: ago(1h), ago(24h), ago(7d).

bin(TimeGenerated, 5m) for 5-minute buckets.

EventID 4625 for failed logins.

CounterName == "% Processor Time" for CPU.

Default time range: 24 hours (if not specified, the query uses the workspace's default).

Query timeout: 30 minutes.

Result size limit: 64MB.

Edge Cases and Exceptions

Cross-workspace queries: Use union with workspace() function. Example: union workspace('workspace1').Perf, workspace('workspace2').Perf.

Case sensitivity: Column names and values are case-sensitive. Computer is not the same as computer.

Null handling: Use isnull() to check for nulls, not == null.

String escaping: Use single quotes for strings, double quotes for column names if they contain special characters.

Render types: timechart requires a datetime column and a numeric column. If missing, use makelist() or series.

How to Eliminate Wrong Answers

If the question asks for a query to find failed logins, look for SecurityEvent with EventID == 4625. Any other table is wrong.

If the question asks for average CPU over time, the query must include summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m). No summarize or no bin() is incorrect.

If the question asks for a chart, the query must end with render timechart. Any other operator after render is wrong.

If the question asks for a top-N list, use top N by ... or summarize ... | sort by ... | take N.

If the question involves time, ensure ago() or datetime() is used correctly.

Key Takeaways

Always filter by TimeGenerated using `where TimeGenerated > ago(...)` to limit data and improve performance.

Use `project` to select only necessary columns; avoid `extend` without subsequent `project` to reduce result size.

`summarize` with `bin(TimeGenerated, timespan)` is essential for time-series aggregation.

`render timechart` must be the last operator in the query.

`has` is faster than `contains` for token matching; use `==` for exact matches.

The default time range in Azure Monitor is 24 hours, but always specify explicitly.

For performance, filter early with `where` on indexed columns like TimeGenerated and Computer.

Use `let` to define reusable subqueries and improve readability.

Case sensitivity matters: 'Computer' is different from 'computer'.

Join types: inner, leftouter, fullouter, leftanti, rightanti — choose based on whether you need unmatched rows.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

KQL

Pipeline-based: data flows through pipe (|) operators.

Case-sensitive for column names and string comparisons.

Optimized for log analytics and time-series data.

Supports native time bucketing with bin().

No JOIN in the traditional sense; uses join operator with kind parameter.

SQL

Declarative: SELECT-FROM-WHERE structure.

Case-insensitive by default (depends on collation).

Optimized for transactional data and normalized schemas.

Requires manual date truncation functions.

Uses JOIN clause with ON condition.

Watch Out for These

Mistake

KQL is the same as SQL.

Correct

KQL is different: it is pipeline-based (data flows left to right through pipes), case-sensitive, and optimized for columnar data. SQL is row-oriented and uses SELECT-FROM-WHERE syntax. KQL does not support subqueries in the same way; use `let` or `materialize()` instead.

Mistake

You don't need to filter by time because the UI sets a default.

Correct

The UI default is 24 hours, but queries run programmatically (e.g., in alerts) do not have a UI default. Always explicitly filter `where TimeGenerated > ago(...)` to ensure correct results and avoid scanning all data.

Mistake

`contains` and `has` are interchangeable.

Correct

`contains` performs a substring match and is slower. `has` performs a token match (whole word) and is faster. For example, `"disk" has "disk"` matches, but `"disks" has "disk"` does not (because 'disks' is a different token). Use `has` for exact token matching, `contains` for partial matches when necessary.

Mistake

`summarize` always returns one row per group.

Correct

Yes, `summarize` returns one row per unique combination of group-by columns. If you don't group by anything, it returns a single row with the aggregate over all rows. This is useful for counts but can be misleading if you expect raw data.

Mistake

You can use `render` anywhere in the query.

Correct

`render` must be the last operator in the query. If you place it before a `where` clause, the query will fail with a syntax error. Always ensure `render` is the final statement.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the default time range if I don't specify a time filter in KQL?

In Azure Monitor Logs, if you run a query without a time filter, the default time range is 24 hours (as set in the query editor). However, this default is not applied when queries are used in alerts or other automated contexts. Always explicitly filter with `where TimeGenerated > ago(1d)` to ensure consistent behavior.

How do I find the top 5 servers by average CPU usage in the last hour?

Use the Perf table: `Perf | where TimeGenerated > ago(1h) | where CounterName == "% Processor Time" | summarize avg(CounterValue) by Computer | top 5 by avg_CounterValue desc`. This filters to CPU counter, averages per server, and returns the top 5.

What's the difference between `has` and `contains` in KQL?

`has` performs a token match (whole word), e.g., `"error" has "error"` matches, but `"errors" has "error"` does not. `contains` performs a substring match, so `"errors" contains "error"` matches. `has` is faster because it uses tokenization. Use `has` when you need an exact word; use `contains` when you need partial matching.

Can I join two tables in KQL? How?

Yes, use the `join` operator. Example: `Table1 | join kind=inner Table2 on KeyColumn`. Common join kinds: inner (only matching rows), leftouter (all rows from left, nulls from right), fullouter (all rows from both). Always filter both tables before joining to improve performance.

Why does my `render timechart` fail?

`render` must be the last operator in the query. If you place any operator after it (e.g., `| where ...`), the query fails. Also ensure the data includes a datetime column (for the x-axis) and at least one numeric column (for the y-axis). Use `bin(TimeGenerated, 5m)` to create time buckets.

How do I create an alert based on a KQL query?

In Azure Monitor, create a new alert rule, select the Log Analytics workspace as the resource, and write the KQL query in the condition step. Set the evaluation frequency (e.g., every 5 minutes) and threshold (e.g., result count > 10). The query must return a number of rows or a numeric value to trigger the alert.

What is the `let` statement used for?

`let` defines a variable or function that can be reused in the query. For example: `let threshold = 90; Perf | where CounterValue > threshold`. It improves readability and performance by avoiding duplication. You can also define tabular expressions: `let highCPU = Perf | where CounterValue > 90; highCPU | count`.

Terms Worth Knowing

Ready to put this to the test?

You've just covered KQL Queries for Azure Monitoring — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.

Done with this chapter?