SC-200Chapter 6 of 101Objective 2.2

KQL for Security Analysts

This chapter covers Kusto Query Language (KQL) as used by Microsoft Sentinel for security operations. KQL is the query language for Azure Data Explorer and is essential for querying logs in Sentinel, Microsoft 365 Defender, and other Microsoft security products. On the SC-200 exam, roughly 25–30% of questions involve KQL directly or require understanding KQL results. You will be expected to read, interpret, and troubleshoot KQL queries, and occasionally write simple queries. This chapter provides a deep, exam-focused understanding of KQL operators, functions, and best practices for security analysis.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

KQL as a Forensic Investigator's Toolkit

Think of KQL as a forensic investigator's toolkit for a massive digital crime scene — your Microsoft 365 tenant. The investigator (you) arrives with a set of specialized tools: a magnifying glass (the where operator to zoom in on specific clues), a timeline board (the summarize and bin operators to group events by time), and a filter system (the project and extend operators to clean and focus on relevant evidence). Each log table is like a separate evidence locker — SecurityEvent for Windows logs, SigninLogs for authentication attempts, AlertEvidence for detection alerts. The investigator writes a search warrant (a KQL query) that specifies exactly which lockers to open, which clues to pull, and how to piece them together. The join operator is like cross-referencing two witness statements to find a common suspect. The let statement is like assigning a case file number to a piece of evidence for reuse. Just as an investigator must follow a chain of custody and apply precise filters to avoid contamination, a KQL query must be structured correctly to yield accurate, actionable results. A missing where clause is like failing to fingerprint a key suspect — you get too much noise. An incorrect summarize is like mislabeling evidence — you draw wrong conclusions. The SC-200 exam tests your ability to wield this toolkit efficiently, not just to write queries but to think like an investigator: what data is available, how to filter it, and how to correlate it to identify threats.

How It Actually Works

What is KQL and Why Does It Matter for Security Analysts?

Kusto Query Language (KQL) is a read-only query language designed for querying large semi-structured datasets in Azure Data Explorer (ADX). Microsoft Sentinel stores all its data in Log Analytics workspaces, which are built on ADX. Therefore, every query you run in Sentinel — whether in the Logs blade, hunting queries, analytics rules, or workbooks — is KQL. The SC-200 exam expects you to be proficient in reading and modifying KQL queries to investigate security incidents.

Unlike SQL, KQL is designed for stream processing and is case-sensitive. The core syntax is: source | operator1 | operator2 | ... where the pipe (|) passes the result of one operator to the next. The source is typically a table name (e.g., SecurityEvent, SigninLogs, AlertEvidence).

How KQL Works Internally: The Query Pipeline

When you submit a KQL query, the ADX engine parses it into a tree of operators. Each operator consumes a tabular input and produces a tabular output. The engine optimizes the order of operators (e.g., pushing filters down to reduce data early). The key is that operators are applied sequentially, but the engine may reorder them as long as the result is equivalent.

For example:

SecurityEvent
| where EventID == 4625
| where Account == "jdoe"
| project TimeGenerated, Account, IpAddress

The engine may combine the two where clauses into a single filter. The project operator then reduces the columns. The result is a table with three columns.

Key Components: Tables, Columns, and Data Types

Each table in Sentinel has a schema: a set of columns with specific data types. Common data types include: - string: text (case-sensitive comparisons!) - int, long: integer numbers - datetime: date and time (always in UTC in Sentinel) - dynamic: JSON-like structures (e.g., AdditionalDetails in AlertEvidence) - timespan: time intervals

You can view table schemas using:

SecurityEvent
| getschema

Or in the Logs blade, you can expand the Tables pane.

Essential Operators for Security Analysts

#### where — Filtering Rows The most common operator. Use it to narrow down to relevant events. Always filter early to reduce data.

SecurityEvent
| where EventID == 4625  // Failed logon
| where TimeGenerated > ago(1d)

Note: == for equality, != for inequality, contains for substring (case-insensitive), has for exact term (case-insensitive). For case-sensitive, use contains_cs and has_cs.

#### project — Selecting Columns Use to reduce output columns. Also used to rename columns with project-rename.

SigninLogs
| project TimeGenerated, UserPrincipalName, AppDisplayName, IPAddress

#### extend — Adding Computed Columns Create new columns based on existing ones.

SecurityEvent
| extend AccountDomain = split(Account, '\\')[0]
| extend AccountName = split(Account, '\\')[1]

#### summarize — Aggregating Data Group rows and compute aggregates. Common functions: count(), dcount(), sum(), avg(), min(), max(), make_set(), make_list().

SigninLogs
| summarize FailedCount = count() by UserPrincipalName, bin(TimeGenerated, 1h)
| where FailedCount > 10

The bin function rounds time to the nearest hour (or any interval). This is crucial for time-series analysis.

#### join — Combining Tables Merge rows from two tables based on a key. Types: inner, leftouter, rightouter, fullouter, leftanti, rightanti, leftsemi, rightsemi.

AlertEvidence
| where AlertId == "some-id"
| join kind=inner (
    SecurityEvent
    | where TimeGenerated > ago(1d)
) on $left.EntityId == $right.TargetUserName

Security analysts often use leftanti to find events in one table that don't have a match in another (e.g., sign-ins without alerts).

#### let — Defining Variables Use to create reusable queries or constants.

let threshold = 10;
let suspiciousUsers = (
    SigninLogs
    | where ResultType == "50057"
    | summarize Count = count() by UserPrincipalName
    | where Count > threshold
);
suspiciousUsers
| join kind=inner (IdentityLogonEvents) on UserPrincipalName

Functions and Macros

KQL has built-in functions for security: ago(), startofday(), endofday(), parse_json(), tostring(), todatetime(). The parse operator extracts patterns from strings.

SecurityEvent
| parse EventData with * '<Data Name="TargetUserName">' UserName '</Data>' *
| project UserName

Performance Considerations

Filter as early as possible. Use where before summarize or join.

Use project to limit columns before join.

Avoid extend on large datasets if not needed.

Use has instead of contains when you need exact term match — has is faster because it uses an index.

Use ago() for time filters rather than absolute datetime strings.

For large joins, ensure the left table is the smaller one.

How KQL Interacts with Sentinel Features

Analytics rules: Each rule contains a KQL query that runs on a schedule. The query must return results for the rule to fire.

Hunting queries: Pre-built KQL queries in the Hunting blade. You can customize them.

Workbooks: Use KQL to populate visualizations.

Threat intelligence: Use ThreatIntelligenceIndicator table with KQL to match IOCs.

UEBA: Queries against BehaviorAnalytics table.

Common KQL Patterns for Security

Failed logon brute force:

SecurityEvent
| where EventID == 4625
| summarize FailedCount = count() by Account, IpAddress, bin(TimeGenerated, 5m)
| where FailedCount > 5

Malware alert correlation:

AlertEvidence
| where AlertId in (AlertInfo | where Severity == "High" | project AlertId)
| join kind=inner (DeviceEvents) on $left.EntityId == $right.DeviceName

Data exfiltration via network:

VMConnection
| where Direction == "outbound"
| summarize TotalBytes = sum(BytesSent) by SourceIp, DestinationIp, bin(TimeGenerated, 1h)
| where TotalBytes > 1000000

Exam-Specific Syntax Traps

Case sensitivity: Table names, column names, and string comparisons are case-sensitive. SecurityEvent is correct; securityevent is not.

String literals: Use single quotes or double quotes. Escape with \.

Comments: Use // for single line, /* */ for block.

Null handling: Use isnull() and isnotnull(). In comparisons, == null doesn't work; use isnull().

Time zones: All timestamps are UTC. Use ago() for relative times.

Join key mismatch: Ensure data types match. Use tostring() if needed.

Advanced Operators

mv-expand: Expands multi-value (array) columns into multiple rows.

mv-apply: Applies a subquery to each element of a multi-value column.

partition: Splits a table into partitions and runs a subquery on each.

serialize: Forces a sort order for window functions.

row_cumsum, row_number: Window functions for running totals.

Example of mv-expand:

AlertEvidence
| where AlertId == "..."
| mv-expand AdditionalDetails
| evaluate bag_unpack(AdditionalDetails)

Debugging Queries

Use take 10 to see sample data. Use count to see row counts. Use getschema to verify columns. In the Logs blade, you can see query statistics (CPU, data scanned) to identify performance issues.

Walk-Through

Identify the Data Source

Before writing any KQL, determine which table(s) contain the relevant data. In Sentinel, common tables include SecurityEvent (Windows events), SigninLogs (Azure AD sign-ins), AlertEvidence (alert entities), DeviceEvents (Microsoft Defender for Endpoint), and CommonSecurityLog (syslog). Use the Logs blade's Table browser to explore schemas. Always start with a simple query like `TableName | take 10` to confirm the data exists and to see column names. This step is critical because many exam questions expect you to know which table to query for a specific scenario, e.g., failed logons come from SecurityEvent (EventID 4625) or SigninLogs (ResultType 50057).

Filter with Where Clauses

Apply filters using the `where` operator to reduce the dataset to relevant rows. For security queries, typical filters include time range (e.g., `TimeGenerated > ago(24h)`), event IDs, user names, and IP addresses. Use precise operators: `==` for exact match, `contains` for substring, `has` for term. Remember that string comparisons are case-sensitive by default, but `contains` and `has` are case-insensitive. For performance, filter as early as possible. On the exam, you might be asked to complete a query that filters for a specific EventID — the answer will use `==`, not `contains`.

Project Relevant Columns

Use the `project` operator to select only the columns you need. This reduces the amount of data passed to subsequent operators and speeds up the query. It also makes the output cleaner. You can also rename columns with `project-rename`. Avoid using `project` before filtering if you need columns for the filter — the order is source, then where, then project. However, you can project early if you are certain which columns are needed for filtering. In exam scenarios, you may need to choose between `project` and `extend` — `project` limits columns, `extend` adds columns.

Aggregate with Summarize

Use `summarize` to group rows and compute aggregates like counts, sums, or distinct counts. The `by` clause specifies grouping columns. For time-based grouping, use `bin(TimeGenerated, duration)` to create time buckets. Common aggregate functions: `count()`, `dcount()` (approximate distinct count), `make_set()` (list of distinct values), `make_list()` (list of all values). Security examples: count failed logons per user per hour, list distinct IPs per alert. On the exam, you might be asked which function returns a list of unique values — answer: `make_set()`.

Correlate with Join

Use `join` to combine rows from two tables based on a matching key. The most common join types for security are `inner` (only matching rows), `leftouter` (all rows from left, nulls for non-matches), and `leftanti` (rows in left that have no match in right — useful for finding orphaned events). Specify the join key using `on $left.Column == $right.Column`. Ensure data types match; use `tostring()` if needed. Performance tip: put the smaller table on the left. Exam questions often test understanding of join types — e.g., which join returns only rows from the left table that have no corresponding row in the right table? Answer: `leftanti`.

What This Looks Like on the Job

Scenario 1: Investigating a Brute Force Attack on Domain Controllers

A SOC analyst detects a high number of failed logon events (EventID 4625) from a single IP address targeting multiple domain admin accounts. The analyst writes a KQL query to identify the scope:

SecurityEvent
| where EventID == 4625
| where TimeGenerated > ago(1h)
| summarize FailedCount = count() by IpAddress, Account, bin(TimeGenerated, 5m)
| where FailedCount > 10
| order by FailedCount desc

The query reveals the attacking IP, the targeted accounts, and the time pattern. The analyst then uses join to correlate with SecurityEvent for successful logons (EventID 4624) from the same IP to determine if any accounts were compromised. In production, this query runs across thousands of domain controllers; performance is critical. The team optimizes by filtering on LogonType and using has for IP comparisons. Common misconfiguration: forgetting to bin TimeGenerated, leading to an overwhelming number of rows.

Scenario 2: Correlating Alerts with Device Events

A Sentinel analytics rule fires a high-severity alert for malware detected on a device. The analyst needs to see all processes executed on that device around the alert time. They use:

let alertTime = AlertInfo | where AlertId == "..." | project TimeGenerated;
let deviceName = AlertEvidence | where AlertId == "..." | where EntityType == "Device" | project EntityId;
DeviceEvents
| where DeviceName in (deviceName)
| where TimeGenerated between (alertTime - 1h .. alertTime + 1h)
| project TimeGenerated, ActionType, FileName, ProcessCommandLine

This query uses let to define reusable variables, making it readable. In production, the between operator is essential for time windows. A common mistake is using ago() with a fixed duration instead of a dynamic window based on the alert time. The team also uses materialize() for large datasets to cache results.

Scenario 3: Hunting for Data Exfiltration via SharePoint

A security engineer monitors unusual download activity from SharePoint. They query:

OfficeActivity
| where Operation == "FileDownloaded"
| where TimeGenerated > ago(7d)
| summarize DownloadCount = count() by UserId, ClientIP, bin(TimeGenerated, 1h)
| where DownloadCount > 50
| join kind=inner (IdentityInfo) on $left.UserId == $right.UserPrincipalName
| project UserId, UserDisplayName, ClientIP, DownloadCount, TimeGenerated

This query identifies users with excessive downloads, then joins with user identity data for context. In large enterprises, the OfficeActivity table can be huge; filtering by operation and time early is crucial. The team uses dcount() to estimate distinct files downloaded. A pitfall: the ClientIP column may contain IPv6 addresses; using tostring() and parse_ipv4() helps normalize.

How SC-200 Actually Tests This

The SC-200 exam (Objective 2.2: Query data in Microsoft Sentinel using KQL) tests your ability to read, interpret, and modify KQL queries. You are not expected to write complex queries from scratch, but you must understand the purpose of each operator and be able to identify syntax errors or logical flaws. Approximately 25–30% of exam questions involve KQL directly or require understanding query results.

Common Wrong Answers and Why Candidates Choose Them

Using `contains` instead of `==` for exact match: Candidates often use contains when they need an exact EventID. For example, where EventID contains 4625 will also match 46251, 14625, etc. The exam expects == for exact values.

Confusing `project` and `extend`: project keeps only specified columns; extend adds new columns while keeping all existing ones. A question asking "which operator adds a new column?" — the wrong answer is project.

Misusing `join` types: leftanti is often confused with leftouter. leftanti returns rows from the left table that have NO match in the right table. leftouter returns all left rows with nulls for non-matches. Exam questions test this distinction.

Incorrect time filter syntax: Using TimeGenerated > 1d instead of TimeGenerated > ago(1d). The ago() function is required.

Forgetting case sensitivity: Writing securityevent instead of SecurityEvent. Table names are case-sensitive.

Specific Numbers and Terms That Appear Verbatim

EventID 4625 (failed logon), 4624 (successful logon), 4688 (process creation), 5156 (Windows Firewall allowed connection).

SigninLogs ResultType: 50057 (user account disabled), 50126 (invalid username/password).

ago(7d) is common for default hunting queries.

bin(TimeGenerated, 1h) is the standard time bucket.

make_set() returns distinct values; make_list() returns all values.

dcount() is approximate; count() is exact.

Edge Cases and Exceptions

Null handling: where Column != "value" does NOT filter out nulls. Use where Column != "value" or isnull(Column) if needed.

String escaping: In parse operator, backslashes must be escaped.

Dynamic fields: Use evaluate bag_unpack() to expand JSON-like columns.

Time zone: All times are UTC. ago() uses UTC.

How to Eliminate Wrong Answers

If a query uses contains for an EventID, it's likely wrong because EventIDs are exact numbers.

If a join type is inner but the scenario requires finding unmatched events, eliminate inner.

If a query doesn't filter by time, it's inefficient — but the exam may still ask for the correct operator, not performance.

If a query uses project before where but the where clause references a column not in the project list, the query will fail.

Key Takeaways

KQL is case-sensitive: table names, column names, and string comparisons use exact case.

Always filter early with `where` to reduce data before aggregation or join.

Use `has` for faster exact term matching instead of `contains`.

Use `bin(TimeGenerated, duration)` in `summarize` to group time into buckets.

`leftanti` join returns rows from the left table with no match in the right table.

`make_set()` returns distinct values; `make_list()` returns all values.

`ago(7d)` is the standard relative time filter for hunting queries.

Common EventIDs: 4625 (failed logon), 4624 (successful logon), 4688 (process creation).

Use `project` to reduce columns after filtering, not before.

Nulls are not included in comparisons; use `isnull()` and `isnotnull()`.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

`has` operator

Matches exact term (word boundary), case-insensitive.

Faster performance due to index usage.

Example: `has "admin"` matches "admin" but not "administrator".

Use for IP addresses, usernames, hostnames.

Case-sensitive variant: `has_cs`.

`contains` operator

Matches substring anywhere in the string, case-insensitive.

Slower because it scans the entire string.

Example: `contains "admin"` matches "admin", "administrator", "admin123".

Use when you need partial matches.

Case-sensitive variant: `contains_cs`.

Watch Out for These

Mistake

KQL is case-insensitive like SQL.

Correct

KQL is case-sensitive for table names, column names, and string comparisons. `SecurityEvent` is correct; `securityevent` is not. String comparisons with `==` are case-sensitive; use `=~` for case-insensitive equality.

Mistake

The `contains` operator is faster than `has`.

Correct

`has` is faster because it uses an index for exact term matching, while `contains` does a substring scan. Use `has` when you need to match a whole word or token.

Mistake

Using `project` before `where` always improves performance.

Correct

If the `where` clause references a column that you projected away, the query will fail. Always filter first, then project. The engine may push filters down, but the query must be syntactically correct.

Mistake

`summarize` without `bin` on time columns is fine.

Correct

Without `bin`, each unique timestamp (including milliseconds) becomes a separate group, resulting in too many groups and poor performance. Always use `bin(TimeGenerated, duration)` to group time into buckets.

Mistake

`leftanti` join returns all rows from both tables that don't match.

Correct

`leftanti` returns only rows from the left table that have no matching row in the right table. It does not return any rows from the right table.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between `has` and `contains` in KQL?

`has` matches an exact term (word boundary) and is case-insensitive. It is faster because it uses an index. `contains` matches any substring and is slower. For example, `has "admin"` matches "admin" but not "administrator", while `contains "admin"` matches both. Use `has` for precise matching of IP addresses, usernames, or hostnames. For case-sensitive versions, use `has_cs` and `contains_cs`.

How do I filter for events in the last 24 hours in KQL?

Use the `ago()` function: `where TimeGenerated > ago(24h)`. `ago(24h)` returns the datetime 24 hours before the current time. You can also use `where TimeGenerated between (ago(24h) .. now())`. For absolute times, use `datetime("2025-01-01T00:00:00Z")`. Remember that all timestamps are in UTC.

What is the correct way to join two tables in KQL?

Use the `join` operator with a join kind (e.g., `inner`, `leftouter`, `leftanti`) and specify the key with `on $left.Column == $right.Column`. For example: `Table1 | join kind=inner (Table2) on $left.Key == $right.Key`. Ensure the key columns have the same data type; use `tostring()` if needed. The left table should be the smaller one for performance.

Which KQL operator returns a list of distinct values?

`make_set()` returns an array of distinct values in a group. For example: `summarize DistinctIPs = make_set(IPAddress) by User`. If you need all values (including duplicates), use `make_list()`. The SC-200 exam commonly tests this distinction.

How do I handle JSON-like dynamic columns in KQL?

Use the `evaluate bag_unpack()` function to expand dynamic columns into individual columns. For example: `AlertEvidence | evaluate bag_unpack(AdditionalDetails)`. You can also access specific properties using dot notation: `AdditionalDetails.PropertyName`. Use `parse_json()` to convert a string to dynamic if needed.

What is the purpose of the `bin()` function in KQL?

`bin()` rounds datetime or numeric values to the nearest multiple of a given bin size. It is used in `summarize` to group data into time buckets, e.g., `bin(TimeGenerated, 1h)` groups events into one-hour intervals. Without `bin`, each unique timestamp becomes a separate group, leading to poor performance and too many groups.

Can I use SQL to query Microsoft Sentinel logs?

No, Sentinel uses KQL (Kusto Query Language) natively. However, you can use the `sql` plugin to run SQL queries against Log Analytics, but it is not recommended and not tested on the SC-200 exam. Stick to KQL for all Sentinel queries.

Terms Worth Knowing

Log Analytics SIEM Threat hunting

Ready to put this to the test?

You've just covered KQL for Security Analysts — now see how well it sticks with free SC-200 practice questions. Full explanations included, no account needed.

Try SC-200 practice questions Back to all chapters

Done with this chapter?

Microsoft Sentinel Workspace Setup

Sentinel Analytics Rules

See the full SC-200 study guide