AZ-204Chapter 61 of 102Objective 4.1

KQL Log Queries in Application Insights

This chapter covers Kusto Query Language (KQL) log queries within Azure Application Insights, a critical skill for the AZ-204 exam. KQL is the primary language for analyzing telemetry data in Azure Monitor, and understanding it is essential for diagnosing application issues, building dashboards, and setting up alerts. Approximately 10-15% of exam questions touch on monitoring and diagnostics, with KQL queries being a common task in the 'Monitor' domain. You will learn the syntax, operators, and best practices for writing efficient queries that extract meaningful insights from application logs.

25 min read
Intermediate
Updated May 31, 2026

KQL: The Detective's Case File System

Imagine you are a detective investigating a crime in a large office building. Every employee's action is logged: who entered which room, at what time, for how long, and what they did. These logs are stored in a massive filing cabinet, but the cabinet is organized by date and employee ID—not by crime relevance. To find clues, you need a system that lets you search across all files, filter by specific criteria (e.g., only employees who entered the server room after midnight), group related events (e.g., all entries by the same person within 10 minutes), and project only the relevant fields (e.g., name, room, timestamp). Kusto Query Language (KQL) is exactly that system for Application Insights telemetry data. It allows you to query across billions of events, filter with time-based operators, aggregate counts, and join different data sources—all within a single, pipe-based syntax. Just as a detective uses a case file system to connect disparate clues, a developer uses KQL to trace a user's journey through an application, identify performance bottlenecks, and pinpoint error root causes. The power lies in the ability to chain operations: start with a massive dataset, filter down to the relevant time window, then drill into specific error types, and finally join with custom properties to find the affected user segment.

How It Actually Works

What is KQL and Why Does It Exist?

Kusto Query Language (KQL) is a read-only, tabular query language designed for Azure Data Explorer (ADX) and used extensively in Azure Monitor, including Application Insights. Unlike SQL, which is optimized for transactional workloads, KQL is optimized for large-scale, time-series data analysis. It uses a pipeline (pipe) operator | to pass data from one operation to the next, similar to Unix pipes. This design allows for efficient processing of billions of events by pushing filtering and aggregation down to the storage layer.

How KQL Works Internally

When you submit a KQL query to Application Insights, the query is parsed and executed against the underlying data cluster. The query plan is split into multiple stages: - Data Retrieval: The query engine identifies which tables and time ranges are needed. It leverages the time-based partitioning of data to skip irrelevant partitions. - Filter Pushdown: Filters (e.g., where, where Timestamp > ago(1h)) are pushed down to the storage layer, reducing the amount of data read. - Aggregation: Operations like summarize, count, make-series are performed on distributed nodes, then merged. - Projection: Finally, only the requested columns are returned.

The query results are returned as a table. The entire process is stateless; each query runs independently.

Key Components and Syntax

Tables: Application Insights stores data in several standard tables: - requests – HTTP requests to your app. - dependencies – calls to external services. - exceptions – thrown exceptions. - traces – custom log messages. - pageViews – client-side page views. - customEvents – custom business events.

Operators: - where – filters rows based on a condition. - project – selects columns to include/exclude. - extend – adds new computed columns. - summarize – aggregates data (e.g., count(), avg(), dcount()). - join – merges two tables on a key. - order by – sorts results. - take – limits results to a specified number of rows. - make-series – creates time series for time-chart visualizations.

Functions: - ago(1h) – returns a datetime 1 hour before now. - now() – current UTC time. - format_datetime() – formats date/time. - tostring() – converts to string. - parse_json() – parses JSON strings.

Time Range: Queries can specify a time filter using where Timestamp between (datetime(2023-01-01) .. datetime(2023-01-02)) or where Timestamp > ago(1d). If no time filter is provided, a default time range (usually last 24 hours) is applied.

Writing a Basic Query

A typical KQL query starts with a table name, then pipes through operators. Example:

requests
| where timestamp > ago(1h)
| where resultCode == 500
| project timestamp, name, url, duration
| order by timestamp asc
| take 100

This retrieves the last hour's requests that returned HTTP 500, showing timestamp, request name, URL, and duration, sorted by time, limiting to the first 100 rows.

Aggregations and Summarize

The summarize operator groups rows by one or more columns and applies aggregation functions. Example:

requests
| where timestamp > ago(1d)
| summarize Count = count(), AvgDuration = avg(duration) by name, bin(timestamp, 1h)
| order by timestamp asc

This groups requests by name and hourly bins, counting occurrences and average duration per bin.

Joins

Joins combine data from two tables. Example: find all requests that resulted in an exception.

requests
| join kind=inner (
    exceptions
    | project operation_Id, exceptionType
) on operation_Id
| take 100

This joins requests with exceptions on the operation_Id field, showing only requests that have matching exceptions.

Time Series with make-series

For time charts, make-series creates evenly spaced time points. Example:

requests
| make-series Count = count() default=0 on timestamp from ago(7d) to now() step 1h by name
| render timechart

This creates a time series of request counts per hour for the last 7 days, per request name, and renders a time chart.

Performance Considerations

Always include a time filter to reduce data scanned.

Use where before summarize or join to filter early.

Prefer dcount() over count(distinct) for performance.

Avoid project-away if you can use project to keep only needed columns.

Use hint.strategy=shuffle for large joins.

Integration with Azure Monitor

KQL queries can be saved as Log Search alerts, dashboards, or workbooks. They can also be embedded in Azure Dashboards or Power BI. The Application Insights API allows programmatic execution of queries via REST.

Common Pitfalls

Forgetting the time filter: queries run against all data in the retention period, which can be slow.

Using == instead of =~ for case-insensitive string comparison.

Misunderstanding bin(): it rounds timestamps down to the nearest bin, not up.

Overusing take without order by: results are arbitrary.

Exam-Relevant Commands

.show tables – lists available tables.

.show schema – shows table schemas.

set query_results_cache_max_age = 60 – caches results for 60 seconds.

KQL vs SQL

KQL is not SQL. Key differences:

KQL uses pipes, not SELECT ... FROM ... WHERE.

KQL has no GROUP BY; use summarize.

KQL has no JOIN in the same way; it uses join operator with different kinds (inner, leftouter, etc.).

KQL supports let statements to define variables.

Example: Detecting Failed Requests

requests
| where timestamp > ago(1h)
| where resultCode != 200
| project timestamp, name, url, resultCode, duration
| order by timestamp desc

Example: Custom Metrics from Traces

traces
| where timestamp > ago(1h)
| where customDimensions has "PaymentAmount"
| extend amount = toreal(customDimensions.PaymentAmount)
| summarize TotalRevenue = sum(amount) by bin(timestamp, 1h)
| render timechart

Exam Focus

The AZ-204 exam expects you to know how to:

Write KQL queries to retrieve specific telemetry.

Use summarize with bin() for time-based aggregations.

Join tables using operation_Id.

Create time series with make-series.

Understand the difference between requests, dependencies, exceptions, and traces.

Apply time filters correctly.

Use let to define reusable query parts.

Know the default retention period (90 days) and how to adjust it.

Common wrong answers:

Using SQL syntax like SELECT * FROM requests.

Forgetting to include a time filter.

Using count(distinct) instead of dcount().

Confusing project with extend.

Numbers to know:

Default time range: last 24 hours.

Maximum retention: 2 years (pay-as-you-go).

ago(1d) returns datetime 24 hours ago.

bin(timestamp, 1h) rounds down to hour.

Edge cases:

If no time filter, query may timeout on large datasets.

take without order by gives arbitrary rows.

join on non-unique keys can explode result size.

To eliminate wrong answers: understand the pipeline flow. Each operator transforms the table; the order matters. If a question asks for a count of errors by type, you need summarize count() by errorType after filtering.

Walk-Through

1

Identify the Target Table

Start by determining which Application Insights table contains the data you need. Common tables: `requests` for HTTP calls, `dependencies` for external service calls, `exceptions` for errors, `traces` for custom logs, `pageViews` for client-side page loads, and `customEvents` for business events. Use `.show tables` to list available tables or explore the schema with `.show schema`. Choosing the correct table is critical because each table has a different schema. For example, `requests` includes columns like `resultCode`, `duration`, `url`, while `exceptions` has `exceptionType`, `exceptionMessage`, `assembly`. If you query the wrong table, you may miss data or get empty results.

2

Apply Time Range Filter

Always filter by time to limit data scanned. Use `where timestamp > ago(1h)` for the last hour, or `where timestamp between (datetime(2023-01-01) .. datetime(2023-01-02))` for a specific range. The `timestamp` column is present in all tables and is the primary partition key. If you omit the time filter, the query may scan all data in the retention period (default 90 days), causing slow performance or timeouts. On the exam, remember that `ago(1d)` returns the current UTC time minus 24 hours. Use `bin(timestamp, 1h)` for grouping into hourly bins.

3

Filter Rows with where

Use the `where` operator to narrow down rows based on conditions. For example, `where resultCode == 500` for failed requests, or `where exceptionType has 'NullReferenceException'`. String comparisons are case-sensitive by default; use `=~` for case-insensitive matching. You can combine conditions with `and`, `or`, and parentheses. The `where` operator is the most performance-critical step because it reduces the data flowing through the pipeline. Always place `where` as early as possible. Avoid using `where` on computed columns if possible, as that prevents index usage.

4

Select Columns with project

Use `project` to include only the columns you need. For example, `project timestamp, name, url, duration`. This reduces the amount of data passed to subsequent operators and speeds up the query. You can also use `project-away` to exclude specific columns, but `project` is preferred because it explicitly lists what you want. If you need to rename a column, use `project NewName = OldName`. You can also compute new columns inline, but that is better done with `extend`.

5

Aggregate with summarize

Use `summarize` to group rows and compute aggregations. Common aggregation functions: `count()`, `dcount()` (distinct count), `avg()`, `sum()`, `min()`, `max()`, `percentile()`. For time-based grouping, use `bin(timestamp, 1h)` to create hourly buckets. Example: `summarize Count = count() by name, bin(timestamp, 1h)`. The `by` clause lists grouping columns. Multiple aggregation functions can be used in one summarize. Note that `dcount()` is approximate but fast; for exact counts, use `count()` with a distinct column.

6

Sort and Limit Results

Use `order by` to sort the results by one or more columns, typically by timestamp or count. Use `asc` or `desc` for ascending or descending order. Then use `take` to limit the number of rows returned. Without `order by`, `take` returns arbitrary rows. Example: `order by timestamp asc | take 100`. This is useful for previewing data or when you only need the top N results. For large datasets, consider using `top` operator which combines sort and take: `top 100 by timestamp asc`.

7

Visualize with render

Use the `render` operator to display results as a chart. Common chart types: `timechart`, `barchart`, `piechart`. `render` should be the last operator in the query. Example: `| render timechart`. The `render` operator works best with time-series data produced by `make-series` or `summarize` with time bins. In the Azure portal, you can also switch between table and chart views interactively. For the exam, know that `render` is optional and only affects visualization, not the data returned.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Checkout Failure Investigation

A large e-commerce platform uses Application Insights to monitor its checkout process. Users report intermittent failures during payment processing. The operations team writes a KQL query to correlate requests, dependencies, and exceptions across the checkout flow. They start with requests filtered by name contains 'Checkout' and timestamp > ago(1h). They join with dependencies on operation_Id to see calls to the payment gateway. They then join with exceptions to see any exceptions thrown during the same operations. Using summarize by url and resultCode, they discover that requests to a specific payment provider endpoint have a high failure rate. The query:

requests
| where name contains "Checkout" and timestamp > ago(1h)
| join kind=leftouter (dependencies | where type == "HTTP" and target contains "payment") on operation_Id
| join kind=leftouter (exceptions) on operation_Id
| project timestamp, name, resultCode, dependencyName = name1, exceptionType
| summarize Failures = count() by resultCode, dependencyName
| order by Failures desc

This identifies the failing dependency. The team then sets up an alert on that dependency's failure rate.

Enterprise Scenario 2: Performance Degradation Debugging

A SaaS company notices increased response times after a recent deployment. They use KQL to compare performance metrics across versions. They query requests and use extend to parse the application version from custom properties. They then summarize average duration by version and bin timestamp by 1 hour. The query:

requests
| where timestamp > ago(7d)
| extend appVersion = tostring(customDimensions["AppVersion"])
| summarize AvgDuration = avg(duration) by appVersion, bin(timestamp, 1h)
| render timechart

They see that version 2.1.0 has a spike in duration. They drill down into that version with where appVersion == "2.1.0" and where duration > 5000 to find slow endpoints. They then join with dependencies to see if external calls are slow. This leads them to a misconfigured database connection string in the new version.

Common Misconfigurations

Missing time filter: Queries that scan the entire retention period (90 days) can time out or incur high costs. Always add a time filter.

Inefficient joins: Joining large tables without filtering first can cause performance issues. Use hint.strategy=shuffle for large joins.

Overusing `take` without sorting: Results are arbitrary and may mislead. Always sort if you need the top N.

Ignoring case sensitivity: where name == "Error" will not match "error". Use =~ for case-insensitive matching.

Scale Considerations

In production, Application Insights can ingest terabytes of data per day. Queries should be optimized to scan as little data as possible. Use time filters, filter on indexed columns (like operation_Id), and avoid project-away. For dashboards, use cached queries or set query_results_cache_max_age to reduce repeated scans.

How AZ-204 Actually Tests This

Exactly What AZ-204 Tests on KQL

The AZ-204 exam (objective 4.1: Monitor) expects you to:

Write KQL queries to retrieve telemetry from Application Insights.

Use summarize with bin() for time-based aggregations.

Join tables using operation_Id.

Create time series with make-series.

Understand the schema of standard tables: requests, dependencies, exceptions, traces, pageViews, customEvents.

Apply time filters correctly.

Use let to define reusable query parts.

Know the default retention period (90 days) and how to adjust it (under Usage and estimated costs).

Common Wrong Answers and Why Candidates Choose Them

1.

Using SQL syntax: Many candidates write SELECT * FROM requests WHERE .... This is wrong because KQL is not SQL. The correct syntax is requests | where ....

2.

Forgetting time filter: Candidates often omit the time filter, causing the query to scan all data. The exam expects you to always include a time filter for efficiency.

3.

Using `count(distinct)` instead of `dcount()`: count(distinct) is not valid in KQL. The correct function is dcount() for approximate distinct count, or use summarize by column and then count().

4.

Confusing `project` with `extend`: project selects columns; extend adds new columns. Candidates sometimes use project to add a computed column, which removes all other columns. The correct approach is extend followed by project if needed.

Specific Numbers and Terms on the Exam

ago(1d) returns datetime 24 hours ago (UTC).

bin(timestamp, 1h) rounds down to the nearest hour.

Default time range for queries: last 24 hours.

Default retention: 90 days; maximum: 2 years (pay-as-you-go).

Tables: requests, dependencies, exceptions, traces, pageViews, customEvents.

Common columns: operation_Id, timestamp, name, resultCode, duration, cloud_RoleName.

Edge Cases and Exceptions

If no time filter is provided, the query may timeout on large datasets. The exam may present a scenario where a query is slow and ask you to fix it.

take without order by returns arbitrary rows. The exam might ask what the result of take 10 is without sorting; answer: unpredictable.

Joining on operation_Id is common, but operation_Id is not unique per request; it groups related telemetry. Joining on it may produce multiple matches.

make-series requires a default value for missing bins; if not specified, missing bins are filled with null.

How to Eliminate Wrong Answers

Understand the pipeline flow. Each operator transforms the table; the order matters. If a question asks for a count of errors by type, you need summarize count() by errorType after filtering. If an answer uses GROUP BY, it's wrong. If it uses SELECT, it's wrong. If it omits the table name, it's wrong. Always check for time filter presence. Look for correct operator names: where, project, summarize, join, extend, make-series, render. Also, know that let is used for variable assignment, not for declaring functions (use let with = (param) => { ... } for lambdas).

Key Takeaways

KQL is a read-only, pipeline-based query language used in Azure Monitor and Application Insights.

Always include a time filter (e.g., `where timestamp > ago(1h)`) to limit data scanned.

Use `summarize` with `bin(timestamp, 1h)` for time-based aggregations.

Join tables on `operation_Id` to correlate requests, dependencies, and exceptions.

Use `project` to select columns and `extend` to add computed columns.

Use `dcount()` for approximate distinct counts, not `count(distinct)`.

Default retention period for Application Insights is 90 days; maximum is 2 years.

The `render` operator visualizes results as charts (e.g., timechart).

Use `let` to define variables and reusable query parts.

Without `order by`, `take` returns arbitrary rows; use `top` for sorted results.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

KQL

Uses pipe operator | for chaining

Read-only, optimized for analytics

No GROUP BY; use summarize

No JOIN; use join operator with kinds

Supports time series with make-series

SQL

Uses SELECT-FROM-WHERE structure

Read-write, optimized for transactions

Uses GROUP BY for aggregation

Uses JOIN in FROM clause

No native time series function

Watch Out for These

Mistake

KQL is the same as SQL with different syntax.

Correct

KQL is fundamentally different. It uses a pipeline model (|) instead of SELECT-FROM-WHERE. It has no GROUP BY; use summarize. It has no JOIN in the same way; use the join operator with kinds like inner, leftouter. SQL queries will not work in KQL.

Mistake

You can omit the time filter in a query.

Correct

Omitting the time filter causes the query to scan all data in the retention period (default 90 days), leading to slow performance and potential timeout. Always include a time filter like `where timestamp > ago(1h)`.

Mistake

The `take` operator returns the first N rows in order.

Correct

Without an `order by`, `take` returns arbitrary rows. To get the first N rows by a specific order, use `order by` before `take`, or use `top` operator which combines both.

Mistake

`count(distinct column)` is valid in KQL.

Correct

`count(distinct column)` is SQL syntax. In KQL, use `dcount(column)` for approximate distinct count, or use `summarize by column` and then `count()` for exact count.

Mistake

The `project` operator can add new computed columns.

Correct

`project` selects only the columns you specify; it does not add new columns. To add a computed column, use `extend` first, then optionally `project` to keep only desired columns.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I write a KQL query to get all failed requests in the last hour?

Use the `requests` table with a time filter and a condition on `resultCode`. Example: `requests | where timestamp > ago(1h) | where resultCode != 200 | project timestamp, name, url, resultCode, duration | order by timestamp desc`. This retrieves non-200 requests from the last hour, sorted by time.

What is the difference between `project` and `extend` in KQL?

`project` selects only the columns you specify, discarding all others. `extend` adds new computed columns to the existing set. Use `extend` when you want to keep all original columns and add new ones; use `project` when you only need a subset. Example: `requests | extend Success = resultCode == 200` adds a boolean column; `requests | project name, duration` keeps only those two columns.

How do I join requests and exceptions in KQL?

Use the `join` operator on the `operation_Id` column. Example: `requests | join kind=inner (exceptions) on operation_Id | project timestamp, name, resultCode, exceptionType`. This returns all requests that have a corresponding exception, combining their columns.

What does `bin(timestamp, 1h)` do?

`bin()` rounds a datetime value down to the nearest multiple of the given bin size. For example, `bin(datetime(2023-01-01T12:34:56Z), 1h)` returns `2023-01-01T12:00:00Z`. It is commonly used with `summarize` to group events into time buckets for aggregation.

How can I create a time series chart in KQL?

Use the `make-series` operator to create evenly spaced time points, then use `render timechart`. Example: `requests | make-series Count = count() default=0 on timestamp from ago(7d) to now() step 1h by name | render timechart`. This plots request counts per hour for the last 7 days, grouped by request name.

What is the default time range for a KQL query in Application Insights?

If no time filter is specified, the query runs against the last 24 hours of data. You can override this by adding a `where timestamp > ago(1h)` or using the time picker in the portal. For large datasets, always specify a time range to improve performance.

How do I use `let` in KQL?

`let` is used to define variables or functions. For example: `let failedRequests = requests | where resultCode >= 500; failedRequests | summarize count() by name`. You can also define scalar variables: `let threshold = 5000; requests | where duration > threshold`.

Terms Worth Knowing

Ready to put this to the test?

You've just covered KQL Log Queries in Application Insights — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.

Done with this chapter?