This chapter covers Kusto Query Language (KQL) log queries within Azure Application Insights, a critical skill for the AZ-204 exam. KQL is the primary language for analyzing telemetry data in Azure Monitor, and understanding it is essential for diagnosing application issues, building dashboards, and setting up alerts. Approximately 10-15% of exam questions touch on monitoring and diagnostics, with KQL queries being a common task in the 'Monitor' domain. You will learn the syntax, operators, and best practices for writing efficient queries that extract meaningful insights from application logs.
Jump to a section
Imagine you are a detective investigating a crime in a large office building. Every employee's action is logged: who entered which room, at what time, for how long, and what they did. These logs are stored in a massive filing cabinet, but the cabinet is organized by date and employee ID—not by crime relevance. To find clues, you need a system that lets you search across all files, filter by specific criteria (e.g., only employees who entered the server room after midnight), group related events (e.g., all entries by the same person within 10 minutes), and project only the relevant fields (e.g., name, room, timestamp). Kusto Query Language (KQL) is exactly that system for Application Insights telemetry data. It allows you to query across billions of events, filter with time-based operators, aggregate counts, and join different data sources—all within a single, pipe-based syntax. Just as a detective uses a case file system to connect disparate clues, a developer uses KQL to trace a user's journey through an application, identify performance bottlenecks, and pinpoint error root causes. The power lies in the ability to chain operations: start with a massive dataset, filter down to the relevant time window, then drill into specific error types, and finally join with custom properties to find the affected user segment.
What is KQL and Why Does It Exist?
Kusto Query Language (KQL) is a read-only, tabular query language designed for Azure Data Explorer (ADX) and used extensively in Azure Monitor, including Application Insights. Unlike SQL, which is optimized for transactional workloads, KQL is optimized for large-scale, time-series data analysis. It uses a pipeline (pipe) operator | to pass data from one operation to the next, similar to Unix pipes. This design allows for efficient processing of billions of events by pushing filtering and aggregation down to the storage layer.
How KQL Works Internally
When you submit a KQL query to Application Insights, the query is parsed and executed against the underlying data cluster. The query plan is split into multiple stages:
- Data Retrieval: The query engine identifies which tables and time ranges are needed. It leverages the time-based partitioning of data to skip irrelevant partitions.
- Filter Pushdown: Filters (e.g., where, where Timestamp > ago(1h)) are pushed down to the storage layer, reducing the amount of data read.
- Aggregation: Operations like summarize, count, make-series are performed on distributed nodes, then merged.
- Projection: Finally, only the requested columns are returned.
The query results are returned as a table. The entire process is stateless; each query runs independently.
Key Components and Syntax
Tables: Application Insights stores data in several standard tables:
- requests – HTTP requests to your app.
- dependencies – calls to external services.
- exceptions – thrown exceptions.
- traces – custom log messages.
- pageViews – client-side page views.
- customEvents – custom business events.
Operators:
- where – filters rows based on a condition.
- project – selects columns to include/exclude.
- extend – adds new computed columns.
- summarize – aggregates data (e.g., count(), avg(), dcount()).
- join – merges two tables on a key.
- order by – sorts results.
- take – limits results to a specified number of rows.
- make-series – creates time series for time-chart visualizations.
Functions:
- ago(1h) – returns a datetime 1 hour before now.
- now() – current UTC time.
- format_datetime() – formats date/time.
- tostring() – converts to string.
- parse_json() – parses JSON strings.
Time Range: Queries can specify a time filter using where Timestamp between (datetime(2023-01-01) .. datetime(2023-01-02)) or where Timestamp > ago(1d). If no time filter is provided, a default time range (usually last 24 hours) is applied.
Writing a Basic Query
A typical KQL query starts with a table name, then pipes through operators. Example:
requests
| where timestamp > ago(1h)
| where resultCode == 500
| project timestamp, name, url, duration
| order by timestamp asc
| take 100This retrieves the last hour's requests that returned HTTP 500, showing timestamp, request name, URL, and duration, sorted by time, limiting to the first 100 rows.
Aggregations and Summarize
The summarize operator groups rows by one or more columns and applies aggregation functions. Example:
requests
| where timestamp > ago(1d)
| summarize Count = count(), AvgDuration = avg(duration) by name, bin(timestamp, 1h)
| order by timestamp ascThis groups requests by name and hourly bins, counting occurrences and average duration per bin.
Joins
Joins combine data from two tables. Example: find all requests that resulted in an exception.
requests
| join kind=inner (
exceptions
| project operation_Id, exceptionType
) on operation_Id
| take 100This joins requests with exceptions on the operation_Id field, showing only requests that have matching exceptions.
Time Series with make-series
For time charts, make-series creates evenly spaced time points. Example:
requests
| make-series Count = count() default=0 on timestamp from ago(7d) to now() step 1h by name
| render timechartThis creates a time series of request counts per hour for the last 7 days, per request name, and renders a time chart.
Performance Considerations
Always include a time filter to reduce data scanned.
Use where before summarize or join to filter early.
Prefer dcount() over count(distinct) for performance.
Avoid project-away if you can use project to keep only needed columns.
Use hint.strategy=shuffle for large joins.
Integration with Azure Monitor
KQL queries can be saved as Log Search alerts, dashboards, or workbooks. They can also be embedded in Azure Dashboards or Power BI. The Application Insights API allows programmatic execution of queries via REST.
Common Pitfalls
Forgetting the time filter: queries run against all data in the retention period, which can be slow.
Using == instead of =~ for case-insensitive string comparison.
Misunderstanding bin(): it rounds timestamps down to the nearest bin, not up.
Overusing take without order by: results are arbitrary.
Exam-Relevant Commands
.show tables – lists available tables.
.show schema – shows table schemas.
set query_results_cache_max_age = 60 – caches results for 60 seconds.
KQL vs SQL
KQL is not SQL. Key differences:
KQL uses pipes, not SELECT ... FROM ... WHERE.
KQL has no GROUP BY; use summarize.
KQL has no JOIN in the same way; it uses join operator with different kinds (inner, leftouter, etc.).
KQL supports let statements to define variables.
Example: Detecting Failed Requests
requests
| where timestamp > ago(1h)
| where resultCode != 200
| project timestamp, name, url, resultCode, duration
| order by timestamp descExample: Custom Metrics from Traces
traces
| where timestamp > ago(1h)
| where customDimensions has "PaymentAmount"
| extend amount = toreal(customDimensions.PaymentAmount)
| summarize TotalRevenue = sum(amount) by bin(timestamp, 1h)
| render timechartExam Focus
The AZ-204 exam expects you to know how to:
Write KQL queries to retrieve specific telemetry.
Use summarize with bin() for time-based aggregations.
Join tables using operation_Id.
Create time series with make-series.
Understand the difference between requests, dependencies, exceptions, and traces.
Apply time filters correctly.
Use let to define reusable query parts.
Know the default retention period (90 days) and how to adjust it.
Common wrong answers:
Using SQL syntax like SELECT * FROM requests.
Forgetting to include a time filter.
Using count(distinct) instead of dcount().
Confusing project with extend.
Numbers to know:
Default time range: last 24 hours.
Maximum retention: 2 years (pay-as-you-go).
ago(1d) returns datetime 24 hours ago.
bin(timestamp, 1h) rounds down to hour.
Edge cases:
If no time filter, query may timeout on large datasets.
take without order by gives arbitrary rows.
join on non-unique keys can explode result size.
To eliminate wrong answers: understand the pipeline flow. Each operator transforms the table; the order matters. If a question asks for a count of errors by type, you need summarize count() by errorType after filtering.
Identify the Target Table
Start by determining which Application Insights table contains the data you need. Common tables: `requests` for HTTP calls, `dependencies` for external service calls, `exceptions` for errors, `traces` for custom logs, `pageViews` for client-side page loads, and `customEvents` for business events. Use `.show tables` to list available tables or explore the schema with `.show schema`. Choosing the correct table is critical because each table has a different schema. For example, `requests` includes columns like `resultCode`, `duration`, `url`, while `exceptions` has `exceptionType`, `exceptionMessage`, `assembly`. If you query the wrong table, you may miss data or get empty results.
Apply Time Range Filter
Always filter by time to limit data scanned. Use `where timestamp > ago(1h)` for the last hour, or `where timestamp between (datetime(2023-01-01) .. datetime(2023-01-02))` for a specific range. The `timestamp` column is present in all tables and is the primary partition key. If you omit the time filter, the query may scan all data in the retention period (default 90 days), causing slow performance or timeouts. On the exam, remember that `ago(1d)` returns the current UTC time minus 24 hours. Use `bin(timestamp, 1h)` for grouping into hourly bins.
Filter Rows with where
Use the `where` operator to narrow down rows based on conditions. For example, `where resultCode == 500` for failed requests, or `where exceptionType has 'NullReferenceException'`. String comparisons are case-sensitive by default; use `=~` for case-insensitive matching. You can combine conditions with `and`, `or`, and parentheses. The `where` operator is the most performance-critical step because it reduces the data flowing through the pipeline. Always place `where` as early as possible. Avoid using `where` on computed columns if possible, as that prevents index usage.
Select Columns with project
Use `project` to include only the columns you need. For example, `project timestamp, name, url, duration`. This reduces the amount of data passed to subsequent operators and speeds up the query. You can also use `project-away` to exclude specific columns, but `project` is preferred because it explicitly lists what you want. If you need to rename a column, use `project NewName = OldName`. You can also compute new columns inline, but that is better done with `extend`.
Aggregate with summarize
Use `summarize` to group rows and compute aggregations. Common aggregation functions: `count()`, `dcount()` (distinct count), `avg()`, `sum()`, `min()`, `max()`, `percentile()`. For time-based grouping, use `bin(timestamp, 1h)` to create hourly buckets. Example: `summarize Count = count() by name, bin(timestamp, 1h)`. The `by` clause lists grouping columns. Multiple aggregation functions can be used in one summarize. Note that `dcount()` is approximate but fast; for exact counts, use `count()` with a distinct column.
Sort and Limit Results
Use `order by` to sort the results by one or more columns, typically by timestamp or count. Use `asc` or `desc` for ascending or descending order. Then use `take` to limit the number of rows returned. Without `order by`, `take` returns arbitrary rows. Example: `order by timestamp asc | take 100`. This is useful for previewing data or when you only need the top N results. For large datasets, consider using `top` operator which combines sort and take: `top 100 by timestamp asc`.
Visualize with render
Use the `render` operator to display results as a chart. Common chart types: `timechart`, `barchart`, `piechart`. `render` should be the last operator in the query. Example: `| render timechart`. The `render` operator works best with time-series data produced by `make-series` or `summarize` with time bins. In the Azure portal, you can also switch between table and chart views interactively. For the exam, know that `render` is optional and only affects visualization, not the data returned.
Enterprise Scenario 1: E-Commerce Checkout Failure Investigation
A large e-commerce platform uses Application Insights to monitor its checkout process. Users report intermittent failures during payment processing. The operations team writes a KQL query to correlate requests, dependencies, and exceptions across the checkout flow. They start with requests filtered by name contains 'Checkout' and timestamp > ago(1h). They join with dependencies on operation_Id to see calls to the payment gateway. They then join with exceptions to see any exceptions thrown during the same operations. Using summarize by url and resultCode, they discover that requests to a specific payment provider endpoint have a high failure rate. The query:
requests
| where name contains "Checkout" and timestamp > ago(1h)
| join kind=leftouter (dependencies | where type == "HTTP" and target contains "payment") on operation_Id
| join kind=leftouter (exceptions) on operation_Id
| project timestamp, name, resultCode, dependencyName = name1, exceptionType
| summarize Failures = count() by resultCode, dependencyName
| order by Failures descThis identifies the failing dependency. The team then sets up an alert on that dependency's failure rate.
Enterprise Scenario 2: Performance Degradation Debugging
A SaaS company notices increased response times after a recent deployment. They use KQL to compare performance metrics across versions. They query requests and use extend to parse the application version from custom properties. They then summarize average duration by version and bin timestamp by 1 hour. The query:
requests
| where timestamp > ago(7d)
| extend appVersion = tostring(customDimensions["AppVersion"])
| summarize AvgDuration = avg(duration) by appVersion, bin(timestamp, 1h)
| render timechartThey see that version 2.1.0 has a spike in duration. They drill down into that version with where appVersion == "2.1.0" and where duration > 5000 to find slow endpoints. They then join with dependencies to see if external calls are slow. This leads them to a misconfigured database connection string in the new version.
Common Misconfigurations
Missing time filter: Queries that scan the entire retention period (90 days) can time out or incur high costs. Always add a time filter.
Inefficient joins: Joining large tables without filtering first can cause performance issues. Use hint.strategy=shuffle for large joins.
Overusing `take` without sorting: Results are arbitrary and may mislead. Always sort if you need the top N.
Ignoring case sensitivity: where name == "Error" will not match "error". Use =~ for case-insensitive matching.
Scale Considerations
In production, Application Insights can ingest terabytes of data per day. Queries should be optimized to scan as little data as possible. Use time filters, filter on indexed columns (like operation_Id), and avoid project-away. For dashboards, use cached queries or set query_results_cache_max_age to reduce repeated scans.
Exactly What AZ-204 Tests on KQL
The AZ-204 exam (objective 4.1: Monitor) expects you to:
Write KQL queries to retrieve telemetry from Application Insights.
Use summarize with bin() for time-based aggregations.
Join tables using operation_Id.
Create time series with make-series.
Understand the schema of standard tables: requests, dependencies, exceptions, traces, pageViews, customEvents.
Apply time filters correctly.
Use let to define reusable query parts.
Know the default retention period (90 days) and how to adjust it (under Usage and estimated costs).
Common Wrong Answers and Why Candidates Choose Them
Using SQL syntax: Many candidates write SELECT * FROM requests WHERE .... This is wrong because KQL is not SQL. The correct syntax is requests | where ....
Forgetting time filter: Candidates often omit the time filter, causing the query to scan all data. The exam expects you to always include a time filter for efficiency.
Using `count(distinct)` instead of `dcount()`: count(distinct) is not valid in KQL. The correct function is dcount() for approximate distinct count, or use summarize by column and then count().
Confusing `project` with `extend`: project selects columns; extend adds new columns. Candidates sometimes use project to add a computed column, which removes all other columns. The correct approach is extend followed by project if needed.
Specific Numbers and Terms on the Exam
ago(1d) returns datetime 24 hours ago (UTC).
bin(timestamp, 1h) rounds down to the nearest hour.
Default time range for queries: last 24 hours.
Default retention: 90 days; maximum: 2 years (pay-as-you-go).
Tables: requests, dependencies, exceptions, traces, pageViews, customEvents.
Common columns: operation_Id, timestamp, name, resultCode, duration, cloud_RoleName.
Edge Cases and Exceptions
If no time filter is provided, the query may timeout on large datasets. The exam may present a scenario where a query is slow and ask you to fix it.
take without order by returns arbitrary rows. The exam might ask what the result of take 10 is without sorting; answer: unpredictable.
Joining on operation_Id is common, but operation_Id is not unique per request; it groups related telemetry. Joining on it may produce multiple matches.
make-series requires a default value for missing bins; if not specified, missing bins are filled with null.
How to Eliminate Wrong Answers
Understand the pipeline flow. Each operator transforms the table; the order matters. If a question asks for a count of errors by type, you need summarize count() by errorType after filtering. If an answer uses GROUP BY, it's wrong. If it uses SELECT, it's wrong. If it omits the table name, it's wrong. Always check for time filter presence. Look for correct operator names: where, project, summarize, join, extend, make-series, render. Also, know that let is used for variable assignment, not for declaring functions (use let with = (param) => { ... } for lambdas).
KQL is a read-only, pipeline-based query language used in Azure Monitor and Application Insights.
Always include a time filter (e.g., `where timestamp > ago(1h)`) to limit data scanned.
Use `summarize` with `bin(timestamp, 1h)` for time-based aggregations.
Join tables on `operation_Id` to correlate requests, dependencies, and exceptions.
Use `project` to select columns and `extend` to add computed columns.
Use `dcount()` for approximate distinct counts, not `count(distinct)`.
Default retention period for Application Insights is 90 days; maximum is 2 years.
The `render` operator visualizes results as charts (e.g., timechart).
Use `let` to define variables and reusable query parts.
Without `order by`, `take` returns arbitrary rows; use `top` for sorted results.
These come up on the exam all the time. Here's how to tell them apart.
KQL
Uses pipe operator | for chaining
Read-only, optimized for analytics
No GROUP BY; use summarize
No JOIN; use join operator with kinds
Supports time series with make-series
SQL
Uses SELECT-FROM-WHERE structure
Read-write, optimized for transactions
Uses GROUP BY for aggregation
Uses JOIN in FROM clause
No native time series function
Mistake
KQL is the same as SQL with different syntax.
Correct
KQL is fundamentally different. It uses a pipeline model (|) instead of SELECT-FROM-WHERE. It has no GROUP BY; use summarize. It has no JOIN in the same way; use the join operator with kinds like inner, leftouter. SQL queries will not work in KQL.
Mistake
You can omit the time filter in a query.
Correct
Omitting the time filter causes the query to scan all data in the retention period (default 90 days), leading to slow performance and potential timeout. Always include a time filter like `where timestamp > ago(1h)`.
Mistake
The `take` operator returns the first N rows in order.
Correct
Without an `order by`, `take` returns arbitrary rows. To get the first N rows by a specific order, use `order by` before `take`, or use `top` operator which combines both.
Mistake
`count(distinct column)` is valid in KQL.
Correct
`count(distinct column)` is SQL syntax. In KQL, use `dcount(column)` for approximate distinct count, or use `summarize by column` and then `count()` for exact count.
Mistake
The `project` operator can add new computed columns.
Correct
`project` selects only the columns you specify; it does not add new columns. To add a computed column, use `extend` first, then optionally `project` to keep only desired columns.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Use the `requests` table with a time filter and a condition on `resultCode`. Example: `requests | where timestamp > ago(1h) | where resultCode != 200 | project timestamp, name, url, resultCode, duration | order by timestamp desc`. This retrieves non-200 requests from the last hour, sorted by time.
`project` selects only the columns you specify, discarding all others. `extend` adds new computed columns to the existing set. Use `extend` when you want to keep all original columns and add new ones; use `project` when you only need a subset. Example: `requests | extend Success = resultCode == 200` adds a boolean column; `requests | project name, duration` keeps only those two columns.
Use the `join` operator on the `operation_Id` column. Example: `requests | join kind=inner (exceptions) on operation_Id | project timestamp, name, resultCode, exceptionType`. This returns all requests that have a corresponding exception, combining their columns.
`bin()` rounds a datetime value down to the nearest multiple of the given bin size. For example, `bin(datetime(2023-01-01T12:34:56Z), 1h)` returns `2023-01-01T12:00:00Z`. It is commonly used with `summarize` to group events into time buckets for aggregation.
Use the `make-series` operator to create evenly spaced time points, then use `render timechart`. Example: `requests | make-series Count = count() default=0 on timestamp from ago(7d) to now() step 1h by name | render timechart`. This plots request counts per hour for the last 7 days, grouped by request name.
If no time filter is specified, the query runs against the last 24 hours of data. You can override this by adding a `where timestamp > ago(1h)` or using the time picker in the portal. For large datasets, always specify a time range to improve performance.
`let` is used to define variables or functions. For example: `let failedRequests = requests | where resultCode >= 500; failedRequests | summarize count() by name`. You can also define scalar variables: `let threshold = 5000; requests | where duration > threshold`.
You've just covered KQL Log Queries in Application Insights — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.
Done with this chapter?