This chapter covers Sentinel summary rules for large data, a critical feature for managing high-volume log ingestion and query performance. On the SC-200 exam, approximately 5-10% of questions relate to log management, data retention, and aggregation strategies, with summary rules being a key differentiator. Understanding when and how to implement summary rules—including their configuration, limitations, and interaction with other features like analytics rules and data archiving—is essential for both the exam and real-world operations. This chapter provides a deep dive into the mechanism, configuration, and exam-specific nuances of summary rules.
Jump to a section
Think of Microsoft Sentinel as a giant library where every log entry is a book. Ingesting every book directly into the main reading room would overwhelm the space and make searches slow. Instead, the library uses a summary system: before putting books into the main room, a librarian quickly reads each book and writes a short summary card—the author, title, a few keywords, and a pointer to the full book stored in a basement archive. When a patron asks, 'How many books by author X were published last year?', the librarian looks at the summary cards, counts them in seconds, and gives the answer without ever going to the basement. Only if the patron needs to read a specific book does the librarian fetch it from the archive. This summary system allows the library to answer aggregate queries instantly while keeping the main room uncluttered. In Sentinel, summary rules work exactly like this: they pre-aggregate high-volume log data into concise summaries stored in a separate table, enabling fast queries on large datasets without processing every raw event. The raw data remains available for deep dives when needed, but routine reporting and alerting use the summaries for speed and cost efficiency.
What Are Summary Rules and Why Do They Exist?
Microsoft Sentinel summary rules allow you to pre-aggregate data from high-volume logs into a separate summary table. This is essential when you have massive data sources—such as firewall logs, DNS queries, or authentication events—that generate millions of events per day. Querying raw data for every investigation or report would be slow and expensive. Summary rules run on a schedule (e.g., every 5 minutes, hourly, daily) and perform aggregations like count, sum, average, min, max, or custom KQL functions, storing the results in a dedicated table. This table can then be queried much faster, and with lower costs, because the data volume is reduced by orders of magnitude.
How Summary Rules Work Internally
A summary rule is defined by a Kusto Query Language (KQL) query that specifies the source table, the aggregation logic, and the output table. The rule runs on a schedule defined in the rule's Frequency setting. The process is:
Query Execution: At each scheduled interval, Sentinel executes the KQL query against the source table for the time window defined by the Query period (e.g., last 5 minutes). The query must include aggregation functions (e.g., summarize, make-series) and a bin on a time column to create time buckets.
Result Ingestion: The query results are ingested into the destination table specified in the rule. The destination table must exist before the rule runs; you create it manually or via the rule creation wizard. The destination table is a standard Log Analytics table, but it is recommended to name it with a prefix like Summary_ to distinguish it.
Data Retention: The raw source data may be retained for a shorter period (e.g., 30 days) while the summary table can be retained longer (e.g., 1 year) because it contains aggregated, lower-volume data. This aligns with cost optimization strategies.
Querying: When you need to answer questions like "How many failed logins per hour over the last 30 days?", you query the summary table instead of the raw SigninLogs table. The summary table might have one row per hour instead of millions of rows, making the query nearly instantaneous.
Key Components, Values, Defaults, and Timers
Source Table: The Log Analytics table containing the raw data (e.g., CommonSecurityLog, AzureActivity).
Destination Table: A custom table in the same Log Analytics workspace. Must be created before the rule runs. Example: Summary_CommonSecurityLog.
Frequency: How often the rule runs. Options: Every 5 minutes, Hourly, Daily. Default: Hourly.
Query Period: The time window for data to be processed in each run. Must be greater than or equal to the frequency to avoid missing data. For example, if frequency is 5 minutes, query period might be 5 minutes (no overlap) or 10 minutes (overlap to catch late-arriving data). Default: equal to frequency.
Latency: The rule starts processing data after a built-in delay to account for ingestion latency. This delay is 5 minutes by default and cannot be changed in the UI, but it can be adjusted via API.
Aggregation Function: Must be one of the supported KQL aggregation functions: count(), sum(), avg(), min(), max(), dcount(), percentile(), make_list(), make_set(), etc. You cannot use non-aggregation functions like extend or project as the final output.
Time Bucket: The query must use bin(TimeGenerated, <interval>) to group data into time buckets. The interval should align with the query period; for example, if query period is 5 minutes, bin interval should be 5 minutes.
Retention: The summary table's retention can be set independently from the source table, typically longer.
Configuration and Verification
To create a summary rule in the Azure portal:
Navigate to Sentinel > Content management > Summary rules.
Click "Create" and provide a name and description.
Set the source table, destination table, frequency, and query period.
Write the KQL query. Example:
CommonSecurityLog
| where TimeGenerated > ago(5m)
| summarize Count = count() by bin(TimeGenerated, 5m), DeviceVendor, DeviceProductSet the destination table name (e.g., Summary_CommonSecurityLog). The table must already exist; you can create it via Log Analytics with a schema that matches the query output.
Review and create.
To verify the rule is working:
Check the rule's status in the Summary rules blade (should show "Active").
Query the destination table: Summary_CommonSecurityLog | take 10.
Monitor the rule's execution history via the rule's details pane.
Interaction with Related Technologies
Analytics Rules: Summary rules can feed into analytics rules. For example, an analytics rule can query the summary table to detect anomalies like a sudden spike in login failures, rather than querying raw data. This reduces cost and latency.
Data Archiving: Summary rules are often used in conjunction with data archiving. Raw data can be archived to cheap storage after a short retention, while summaries remain hot for fast queries.
Hunting: Hunters can use summary tables to quickly identify time periods of interest, then pivot to raw archived data for deep investigation.
Workbooks and Dashboards: Summary tables are ideal for powering real-time dashboards because they are fast and cost-effective to query.
Limitations and Considerations
No real-time: Summary rules are batch-oriented; the minimum frequency is 5 minutes, so data is not available until after the next run.
Late-arriving data: If data arrives after the query period closes, it will be missed unless you use overlapping query periods (query period > frequency). However, overlapping can cause duplicate summaries if not handled carefully.
Destination table schema: The destination table's schema must match the query output exactly. If the query changes, you must recreate the table.
Cost: While summary rules reduce query costs, they incur ingestion costs for the summarized data. However, because the volume is much lower, net savings are significant.
KQL constraints: The query must end with a summarize statement (or equivalent aggregation). You cannot use take, order by, or non-aggregation projections as the final operation.
Step-by-Step Mechanism
Define the Rule: You specify the source table, aggregation query, schedule, and destination table.
Schedule Execution: The rule runs at the set frequency (e.g., every 5 minutes). Sentinel's scheduler triggers the query.
Query Execution: The KQL query runs against the source table for the time window [now - query period - latency, now - latency]. The latency (5 minutes) ensures data has been ingested.
Result Ingestion: The query results are ingested into the destination table using the Log Analytics data collector API. The destination table's schema must match the query output columns.
Data Availability: Once ingested, the summary data is available for queries. It inherits the retention policy of the destination table.
Query Usage: Analysts query the summary table for aggregated insights. They can also join with other tables or use the summary to filter raw data (e.g., by time range).
Define Source and Destination
First, identify the high-volume source table (e.g., `CommonSecurityLog`) and create a destination table (e.g., `Summary_CommonSecurityLog`) in the same Log Analytics workspace. The destination table must have a schema that matches the expected output of the summary query. You can create it via Log Analytics using the `create table` command or through the Azure portal. The table should include a time column (usually `TimeGenerated`) and the aggregation columns. This step is critical because the rule will fail if the destination table does not exist or has mismatched columns.
Write the Aggregation Query
Write a KQL query that ends with a `summarize` operator. The query must include a `bin` on the time column to define the aggregation window. For example: `CommonSecurityLog | where TimeGenerated > ago(10m) | summarize Count = count() by bin(TimeGenerated, 5m), DeviceVendor`. Avoid using `take`, `order by`, or `project` as the final operation. The query can include filters to reduce the data volume further. Ensure the query period covers the time window including late-arriving data; typically set query period to 2x frequency to avoid gaps.
Configure Schedule and Latency
Set the frequency (e.g., 5 minutes) and query period (e.g., 10 minutes). The rule will run every 5 minutes, processing data from the last 10 minutes (with a 5-minute latency). The latency is a built-in 5-minute delay to allow for ingestion lag. This configuration ensures that most late-arriving data is captured. However, if data is consistently delayed by more than 5 minutes, you may need to increase the query period or adjust latency via API (not available in UI).
Create and Activate Rule
In the Sentinel Summary rules blade, click 'Create' and fill in the details: name, description, source table, destination table, frequency, query period, and the KQL query. Review the configuration and click 'Create'. The rule will immediately start running on the schedule. You can monitor its status in the Summary rules list; it should show 'Active'. If there are errors (e.g., destination table not found), the rule will show 'Failed' and you need to fix the issue.
Verify and Query Summary Data
After the first run completes (within the frequency interval), query the destination table: `Summary_CommonSecurityLog | take 10`. Verify that the data appears as expected. Check the time range: the `TimeGenerated` values should correspond to the end of each aggregation window. You can also run a test query comparing the summary table to the raw source table for a small time window to ensure accuracy. If the summary looks correct, you can now use this table for dashboards, alerts, and reports.
Scenario 1: Firewall Log Aggregation for a Large Enterprise
A multinational company ingests over 100 million firewall log entries daily from hundreds of devices into CommonSecurityLog. The security team needs to monitor traffic patterns, top source IPs, and blocked connections over time. Querying raw data for a 30-day report would take hours and cost thousands in query fees. They implement a summary rule that runs every hour, aggregating by bin(TimeGenerated, 1h), DeviceVendor, and Action, with a count of events. The summary table reduces data volume to ~24 rows per day per device—a reduction factor of over 4 million. The summary table is retained for 1 year, while raw data is archived after 30 days. Now, the team can run hourly dashboards and weekly reports in seconds. Misconfiguration often occurs when the query period is set too short (e.g., equal to frequency), causing gaps if data arrives late. They learned to set query period to double the frequency (2 hours) to ensure full coverage.
Scenario 2: Authentication Log Analysis for SOC
A SOC monitors SigninLogs for failed login attempts. They need to detect brute-force attacks by counting failed attempts per user per hour. Raw data is 50 million rows/day. They create a summary rule with frequency 5 minutes, query period 10 minutes, aggregating count() by bin(TimeGenerated, 5m), UserPrincipalName, Status. The summary table stores only aggregated counts. An analytics rule then queries the summary table to trigger an alert when a user exceeds 10 failed attempts in an hour. This reduces the analytics rule's query cost by 99% and speeds up detection. A common mistake is forgetting to include Status in the aggregation, causing the summary to mix success and failure counts. The team also discovered that if the destination table schema is changed (e.g., adding a column), the rule fails silently; they now use a script to recreate the table whenever the query changes.
Scenario 3: IoT Device Telemetry Summarization
An IoT company ingests telemetry from millions of devices into DeviceTelemetry table. They need to track average temperature per device per hour. Raw data is 200 million rows/day. They create a summary rule with daily frequency, aggregating avg(Temperature) by bin(TimeGenerated, 1d), DeviceId. The summary table is used for long-term trend analysis and capacity planning. However, they found that daily frequency misses intra-day anomalies. They added a second summary rule with hourly frequency for real-time monitoring. The challenge was managing multiple summary rules and ensuring they don't overlap or conflict. They now use a naming convention and monitor each rule's execution history for failures.
What SC-200 Tests on Summary Rules
The SC-200 exam objective 2.2 covers "Manage large data volumes in Microsoft Sentinel," which includes summary rules. Expect 1-2 questions specifically on summary rules, plus broader questions on data retention and cost optimization that may reference summary rules as a solution. Key areas tested:
When to use summary rules: The exam presents scenarios with high-volume logs (e.g., firewall, DNS) and asks which feature to use. The correct answer is often summary rules, but candidates may confuse it with data archiving or basic log retention policies. Remember: summary rules are for _pre-aggregation_, not just storage.
Configuration parameters: You need to know the default frequency (Hourly), the minimum frequency (5 minutes), and the built-in latency (5 minutes). The exam may ask: "What is the minimum frequency for a summary rule?" Answer: 5 minutes.
KQL requirements: The exam tests that the query must end with a summarize operator and must include bin() on a time column. A common wrong answer is using project or extend as the final operator.
Destination table: You must create the destination table _before_ the rule. A trick question: "You create a summary rule and it fails. What is the most likely cause?" Answer: The destination table does not exist.
Interaction with analytics rules: The exam tests that summary tables can be used as data sources for analytics rules to reduce cost. A distractor might suggest using raw data directly.
Common Wrong Answers and Why Candidates Choose Them
Wrong: "Summary rules can run in real-time." Candidates see "rule" and think of analytics rules which are near real-time. Reality: Summary rules are batch, minimum 5-minute frequency.
Wrong: "The destination table is automatically created." Many assume Azure creates tables automatically. Reality: You must create it manually.
Wrong: "Summary rules can use any KQL query." Candidates think any query is valid. Reality: The query must end with a summarize operator; non-aggregation queries fail.
Wrong: "Summary rules replace the need for data archiving." They are complementary, not replacements.
Specific Numbers and Terms to Memorize
Minimum frequency: 5 minutes
Default frequency: 1 hour
Built-in latency: 5 minutes
Query period should be >= frequency; often set to 2x frequency.
Destination table naming convention: often Summary_<SourceTable>
Retention of summary table can be set independently.
Edge Cases and Exam Traps
If the source table is empty during a run, the rule still succeeds but inserts no data. The rule status remains "Active".
If the destination table's schema changes after rule creation, the rule fails. You must recreate the destination table.
Summary rules cannot be used with tables that have a different retention policy that conflicts (e.g., table is purged before rule runs).
The exam may present a scenario where you need to reduce costs for long-term storage of high-volume data. The best answer is a combination of summary rules and archiving, not just one.
How to Eliminate Wrong Answers
When you see a question about handling large data volumes, ask: "Does the scenario require pre-aggregated results for fast queries?" If yes, summary rules. If the scenario is about storing raw data cheaply, it's archiving. If the scenario is about reducing ingestion cost, it's about data collection rules or filtering. Always check the KQL query in the answer: does it end with summarize? If not, it's wrong.
Summary rules pre-aggregate high-volume log data into a separate table for faster and cheaper queries.
Minimum frequency is 5 minutes; default is hourly. Built-in latency is 5 minutes.
The KQL query must end with a `summarize` operator and include `bin()` on a time column.
The destination table must be created manually before the rule runs.
Summary rules are batch-oriented, not real-time.
They are often used in combination with data archiving for cost optimization.
Query period should be at least equal to frequency, typically double to capture late-arriving data.
These come up on the exam all the time. Here's how to tell them apart.
Summary Rules
Pre-aggregates data into a new table at scheduled intervals.
Reduces query volume and cost for aggregate queries.
Requires manual creation of destination table.
Data is stored hot in Log Analytics (fast query).
Best for frequent reporting and alerting on trends.
Data Archiving
Moves raw data to cheap storage (Azure Data Lake) while retaining query capability.
Reduces hot storage cost but raw data queries are slower and more expensive.
No manual table creation; policy-based.
Data is stored in cold storage (archived) with pay-per-query.
Best for compliance and deep-dive investigations on historical data.
Mistake
Summary rules run in real-time like analytics rules.
Correct
Summary rules are batch-oriented with a minimum frequency of 5 minutes. They cannot provide sub-minute latency. Analytics rules, on the other hand, can run near real-time (every few minutes) but on raw data.
Mistake
The destination table for summary rules is automatically created.
Correct
You must create the destination table manually before the summary rule runs. If the table does not exist, the rule fails. The table schema must match the query output.
Mistake
Summary rules can use any KQL query, including those that project or extend columns.
Correct
The query must end with a `summarize` operator (or equivalent aggregation). Non-aggregation queries like `project` or `extend` as the final operation will cause the rule to fail.
Mistake
Summary rules eliminate the need for data archiving.
Correct
Summary rules reduce query costs by pre-aggregating data, but raw data may still need to be archived for compliance or deep investigation. They are complementary, not replacements.
Mistake
You can set the latency of a summary rule to zero to get data faster.
Correct
The built-in latency is fixed at 5 minutes in the UI. You can adjust it via API, but setting it too low may cause missing data due to ingestion delays. The default 5 minutes is recommended.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
The minimum frequency is 5 minutes. This is the shortest interval at which the rule can run. The default frequency is 1 hour. The exam may test this value, so remember it.
Yes, you must create the destination table manually in the same Log Analytics workspace before the summary rule runs. The table schema must match the output of the summary query. If the table does not exist, the rule will fail.
No, because summary rules are batch-oriented with a minimum 5-minute delay. For near real-time dashboards, use analytics rules on raw data or use streaming methods. Summary tables are better for historical trend dashboards.
The rule will still execute successfully but will insert no rows into the destination table. The rule status will remain 'Active'. This is normal behavior.
Yes, you can edit the rule and change the query. However, if the new query outputs different columns than the existing destination table, the rule will fail. You must update the destination table schema or create a new table.
By pre-aggregating data, the volume of data stored in the summary table is much smaller than the raw source. Queries against the summary table process fewer rows, reducing query costs (pay-per-GB scanned). Additionally, you can retain raw data for a shorter period and keep summaries longer, saving on storage costs.
Yes, summary rules can be used with any Log Analytics table, including custom tables. The source table can be a custom table created from data connectors or custom logs.
You've just covered Sentinel Summary Rules for Large Data — now see how well it sticks with free SC-200 practice questions. Full explanations included, no account needed.
Done with this chapter?