This chapter covers Splunk Search Processing Language (SPL) queries, a critical skill for security analysts using Splunk for security information and event management (SIEM). For the CS0-003 exam, approximately 10-15% of questions in the Security Operations domain involve interpreting or constructing SPL queries to detect threats, analyze logs, and generate reports. Mastering SPL syntax, commands, and common security use cases is essential for passing the exam and performing real-world security analysis.
Jump to a section
Imagine you are a librarian in a massive library with millions of books (log events). Each book has a unique call number (timestamp and source), a title (event type), and content (raw log data). A patron (security analyst) asks you to find all books about 'network intrusions' written in the last hour. You have a card catalog (Splunk index) that organizes books by subject, author, and date. To answer the query, you first go to the 'Subject' section of the catalog, pull out all cards labeled 'Network Intrusion' (search command), then sort those cards by date (time range picker). Next, you narrow down to only those with 'firewall' in the title (filter using search terms). You then count how many books were written by each author (stats count by author) and list the top 10 authors (sort by count, head). Finally, you create a chart showing the number of books per hour over the last 24 hours (timechart). The entire process mirrors how SPL works: you start with a large dataset, filter it step by step using pipes, transform the data with statistical commands, and visualize the results. Each pipe is like handing the stack of cards to another librarian who performs a specific operation before passing the stack to the next. The beauty of SPL is that you can combine many simple operations to answer complex questions without moving physical books—just the index cards (metadata).
What is SPL and Why It Exists
Splunk Search Processing Language (SPL) is a proprietary query language used to search, filter, transform, and analyze machine-generated data in Splunk. Unlike SQL, which operates on structured relational databases, SPL is designed for semi-structured and unstructured log data. It uses a pipe (|) to chain commands sequentially, where each command takes the output of the previous command, processes it, and passes the result to the next. This pipeline architecture allows analysts to build complex queries incrementally.
How SPL Works Internally
When a user submits an SPL query, Splunk parses it into a series of search operations. The search head distributes the query to indexers, which retrieve relevant events from the index based on time range and search terms. Each indexer performs the initial filtering and aggregation locally, then sends results back to the search head for final processing and presentation. The pipeline executes left to right, with each command operating on the entire result set from the previous step. For example:
source="/var/log/secure" | search "Failed password" | stats count by user | sort - count | head 10This query first retrieves all events from the source /var/log/secure, filters for events containing "Failed password", counts the number of failed attempts per user, sorts descending, and returns the top 10 users.
Key Components, Values, and Defaults
Search Time Range: Default is last 24 hours. Specify with earliest and latest modifiers or time picker. Use -30m, -1h, -7d, or absolute times like 01/01/2024:00:00:00.
Index: Default is main. Specify with index=<name>. Security data often lives in index=security or index=windows.
Source Type: Use sourcetype=<type> to filter by data type (e.g., sourcetype=WinEventLog:Security).
Field Extraction: Splunk automatically extracts fields based on sourcetype. Use | fields + field1, field2 to keep only specific fields.
Time Format: _time is the event timestamp. Use | convert ctime(_time) to human-readable format.
Essential SPL Commands for Security
search: Filter events by keywords or field=value pairs. Example: search src_ip=10.0.0.1.
stats: Aggregation command. Syntax: stats <agg_func>(<field>) by <group_field>. Common functions: count, sum, avg, values, dc (distinct count). Example: stats count by dest_ip.
timechart: Creates a time-based chart. Syntax: timechart <agg_func>(<field>) by <group_field>. Example: timechart count by src_ip.
eval: Creates or modifies fields using expressions. Example: eval ratio=count/total.
rex: Extracts fields using regex. Example: rex field=_raw "IP: (?<ip>\d+\.\d+\.\d+\.\d+)".
lookup: Enriches events with external data (e.g., threat intelligence). Example: lookup threat_intel.csv ip AS src_ip OUTPUT threat_type.
transaction: Groups related events into a single transaction. Example: transaction src_ip startswith="Login" endswith="Logout".
where: Filters events after initial search. Example: where count > 100.
top/rare: Lists most/least common values. Example: top limit=5 user.
sort: Sorts results by field. Use - for descending. Example: sort - count.
head/tail: Limits to first/last N results. Example: head 10.
table: Displays results in tabular format. Example: table user, count.
convert: Converts field types (e.g., number to string). Example: convert ctime(_time).
rename: Renames fields. Example: rename src_ip as "Source IP".
dedup: Removes duplicate events. Example: dedup src_ip.
fields: Keeps or removes fields. Example: fields - _raw, _meta.
How SPL Interacts with Related Technologies
SPL queries can be used in correlation searches, dashboards, alerts, and reports. In a typical security workflow, an analyst writes an SPL query to detect a specific threat (e.g., brute force attack). The query is saved as a saved search, which can be scheduled to run periodically. If the query returns results exceeding a threshold, an alert triggers a response via email, ticketing system, or SOAR playbook. SPL also integrates with lookups (CSV files, KV stores) to enrich data, and with data models for accelerated searching.
Performance Considerations
Use time range filters early to reduce data volume.
Limit use of rex on large datasets; use indexed fields when possible.
Use | fields - _raw to drop raw data after extraction.
Use | stats instead of | sort | head for large datasets.
Enable summary indexing and data model acceleration for frequent queries.
Use | search with indexed fields (e.g., index=security not sourcetype=*).
Define the security question
Before writing any SPL, clearly state the question you need to answer. For example: 'Which IP addresses are responsible for the most failed SSH login attempts in the last hour?' This step ensures you know what data sources, time range, and fields are needed. In Splunk, this translates to identifying the relevant index (e.g., index=main), sourcetype (e.g., sourcetype=linux_secure), and the specific log patterns (e.g., 'Failed password'). Defining the question also helps you choose the appropriate aggregation and visualization commands.
Select the time range
Set the time range using the time picker or `earliest` and `latest` modifiers. For example, `earliest=-1h latest=now`. This is critical for performance and relevance. In Splunk, the time range filters events at the index level before any other processing. A poorly chosen time range (e.g., all time) can cause the query to timeout or return irrelevant data. Use `earliest=-7d@d` to start from 7 days ago at midnight.
Retrieve initial data with search
Write the base search using `index`, `sourcetype`, and keywords. This is the first command in the pipeline. For example: `index=linux sourcetype=linux_secure "Failed password"`. This retrieves all events that match the criteria. Use double quotes for phrases. Use `NOT` to exclude terms. Use parentheses for grouping: `("Failed password" OR "Invalid user")`. Avoid wildcards at the start of terms as they are inefficient.
Filter and transform with pipe commands
Append pipe commands to refine the results. For example: `| stats count by src_ip` to count failed attempts per source IP. Then `| where count > 10` to filter IPs with more than 10 attempts. Use `| sort - count` to sort descending. Use `| head 10` to get top 10. Each pipe command processes the entire result set from the previous step. Use `| eval` to create new fields, e.g., `eval hour=strftime(_time, "%H")` to extract hour.
Format and present results
Use commands like `table`, `chart`, or `timechart` to present the data. For example: `| timechart count by src_ip` to show attempts over time per IP. Use `| rename` to make field names readable. Use `| convert ctime(_time)` to convert timestamps. For dashboards, use `| stats` with `by` to create summary tables. Use `| fields - _raw` to remove raw data. Finally, save the query as a report or alert if needed.
Scenario 1: Brute Force Detection on SSH Servers
A security team needs to detect brute force attacks on their Linux servers. They have Splunk ingesting /var/log/secure from all servers. The query:
sourcetype=linux_secure "Failed password" | stats count by src_ip, dest_ip | where count > 10 | sort - countThis identifies source IPs with more than 10 failed attempts. In production, they set the time range to the last hour and run this every 10 minutes as a scheduled alert. If an IP exceeds 50 attempts, an alert triggers a SOAR playbook to block the IP on the firewall. Common issues: false positives from legitimate users mistyping passwords, and performance problems if the query runs against all time. They mitigate by using earliest=-1h and using a summary index for historical data.
Scenario 2: Malware Beaconing Detection
Analysts look for beaconing behavior—regular outbound connections from infected hosts. They use the query:
sourcetype=firewall action=allowed | stats count by src_ip, dest_ip | where count > 100 | sort - countTo detect periodic connections, they use | timechart span=1h count by src_ip and look for consistent counts. In production, they filter out known good destinations (e.g., update servers) using a lookup. They also use | streamstats time_window=1h count by src_ip, dest_ip to detect sudden changes. Misconfiguration: not excluding internal IPs, leading to many false positives.
Scenario 3: Account Lockout Analysis
Windows security logs show account lockouts. The query:
sourcetype=WinEventLog:Security EventCode=4740 | stats count by TargetUserName, CallerComputerName | sort - countThis identifies users with multiple lockouts and the source computer. In a large enterprise, they use | eval lockout_time=strftime(_time, "%Y-%m-%d %H:%M") to track lockout times. They set up a dashboard showing lockout trends. Performance issues arise if they don't limit the time range; they use earliest=-24h and accelerate the data model for EventCode=4740.
What CS0-003 Tests on SPL
The exam objective 1.2 (Given a scenario, analyze data and interpret results to identify security issues) includes SPL queries. You must be able to read a given SPL query and determine what it does, identify errors, and understand the output. Common topics: command sequence, use of stats, timechart, eval, rex, lookup, and transaction. The exam may present a query and ask: 'What is the purpose of this query?' or 'Which command would you add to achieve X?'
Common Wrong Answers
Confusing `stats` with `search`: Candidates think stats filters events, but it aggregates. Example: | stats count by src_ip returns one row per src_ip, not all events. Wrong answer: 'It returns all events with a count field.' Correct: 'It returns the count of events per source IP.'
Misunderstanding pipe order: Candidates think order doesn't matter. But | stats count by src_ip | where count > 10 is different from | where count > 10 | stats count by src_ip. The first filters after aggregation, the second before. Wrong answer: 'Both give same result.' Correct: 'First gives counts per IP, then filters; second filters events with count>10 (which doesn't exist), then aggregates.'
Using `stats` instead of `timechart`: When asked to show trends over time, candidates use stats count by _time. But _time is continuous; stats would group by exact timestamps. Wrong answer: | stats count by _time. Correct: | timechart count or | timechart count by src_ip.
Forgetting `by` clause: | stats count gives total count, not per group. Wrong answer: 'It counts events per source IP.' Correct: 'It counts total events.'
Specific Numbers and Terms
Default time range: 24 hours (but exam may say 'all time' if not specified).
earliest and latest modifiers: -1h, -30m, -7d@d.
stats functions: count, sum, avg, dc, values, list.
timechart span: default is auto, but can specify span=1h.
rex uses Java regex (not PCRE).
lookup can be used with CSV files or KV stores.
transaction default maxspan=30s, maxpause=5s.
Edge Cases
Case sensitivity: SPL commands are case-insensitive, but field values are case-sensitive. src_ip=10.0.0.1 matches exactly. Use | where lower(src_ip)="10.0.0.1" for case-insensitive.
Wildcards: * matches any character, but leading wildcards are inefficient. Use *error* not *error.
Null values: | stats count by user includes events where user is null. Use | where isnotnull(user) first.
Multiple indexes: Use index=* or index=main OR index=security.
How to Eliminate Wrong Answers
Understand the pipeline: each command transforms the result set. If a command like stats aggregates, the output fields change. For example, after stats count by src_ip, only src_ip and count exist. Any subsequent command cannot reference original fields like dest_ip. This eliminates answers that assume original fields are still available. Also, remember that search is implicit at the start; you don't need to write | search. But if you write | search later, it filters the current result set.
SPL uses a pipe (|) to chain commands sequentially; order matters.
The default time range is 24 hours; always specify `earliest` and `latest` for performance.
`stats` aggregates data; `timechart` aggregates over time.
Use `eval` to create or modify fields; `rex` to extract fields via regex.
`lookup` enriches data from external sources; `transaction` groups related events.
Common aggregation functions: count, sum, avg, dc, values, list.
Use `where` to filter after aggregation; use `search` to filter raw events.
Always limit the result set with `head`, `time range`, or `where count > N` to avoid performance issues.
These come up on the exam all the time. Here's how to tell them apart.
stats
Aggregates data by group fields, returns a table with one row per group.
Does not inherently produce a time-based chart; use `by _time` to group by time.
Default aggregation is count, but can use sum, avg, etc.
Output can be sorted and further processed.
Useful for summary reports, top-N lists, and thresholds.
timechart
Aggregates data over time, returns a time-based chart with time as x-axis.
Automatically uses `_time` as the time field; can specify span (e.g., `span=1h`).
Can split by a group field (e.g., `by src_ip`) to create multiple series.
Output is a timechart that can be visualized directly.
Useful for trend analysis, anomaly detection over time.
Mistake
SPL is the same as SQL.
Correct
SPL is a pipeline language, not a declarative query language like SQL. In SPL, commands are executed sequentially left to right, each operating on the entire result set from the previous step. SQL uses a single statement with clauses (SELECT, FROM, WHERE) that are logically processed in a fixed order. SPL also lacks JOINs in the traditional sense; instead, it uses `lookup` or `appendcols` for similar functionality.
Mistake
The order of pipe commands doesn't matter.
Correct
Order is critical. For example, `| stats count by src_ip | where count > 10` first aggregates counts per IP, then filters IPs with count > 10. Reversing to `| where count > 10 | stats count by src_ip` would first try to filter events where count > 10, but count doesn't exist as a field in raw events, resulting in zero events. Understanding the pipeline order is essential for correct query writing.
Mistake
`search` command is always required at the beginning.
Correct
The initial search terms are implicit and do not require the `search` command. For example, `index=main sourcetype=access_combined` is valid without `| search`. However, you can use `| search` later to further filter results, but it's redundant if you already specified terms at the start. The exam may test this by presenting a query starting with `| search` and asking if it's valid.
Mistake
`timechart` always uses `_time`.
Correct
By default, `timechart` uses `_time` as the x-axis. However, you can specify a different time field using the `usetime` argument or by using `| eval mytime=... | timechart ... by mytime`. The exam might present a query that uses `timechart count by src_ip` and assume it uses `_time`, which is correct.
Mistake
`rex` can extract multiple fields in one command.
Correct
`rex` can extract multiple fields using named capture groups in a single regex. For example: `rex field=_raw "IP: (?<ip>\d+\.\d+\.\d+\.\d+), Port: (?<port>\d+)"` extracts both `ip` and `port`. This is efficient and commonly tested.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
`search` is used to filter events based on keywords or field=value pairs. It can be used at the beginning of a query (implicitly) or later in the pipeline to further filter results. `where` is used to filter events based on an expression (e.g., `where count > 10`). The key difference: `search` works on raw events and uses indexed fields efficiently, while `where` works on the current result set and can evaluate expressions. Use `search` for initial filtering; use `where` for post-aggregation filtering.
Use the `rex` command with a regular expression. For example, to extract an IP address from the `_raw` field: `rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"`. This creates a new field called `ip`. You can then use `ip` in subsequent commands. Note that Splunk uses Java regex syntax. For simple extractions, you can also use `| eval ip=extract(_raw, "ip")` if the field is already extracted by the sourcetype.
Use the `timechart` command. Example: `sourcetype=access_combined | timechart count by src_ip`. This creates a time-based chart with one series per source IP. You can specify the time span, e.g., `span=1h`. If you want to limit to the top N IPs, use `| timechart limit=5 count by src_ip`. For a table instead of chart, use `| stats count by _time, src_ip | sort _time`.
`transaction` groups related events that occur within a specified time window. For example, to group login events for a user: `sourcetype=linux_secure | transaction user startswith="Login" endswith="Logout"`. This creates a transaction that includes all events between the login and logout. You can set `maxspan` (default 30s) and `maxpause` (default 5s). Useful for session analysis or multi-step attack detection.
SPL does not have a traditional JOIN. Instead, use `append`, `appendcols`, or `lookup`. `append` combines results from multiple searches vertically: `search1 | append [search2]`. `appendcols` appends fields from a second search to the first based on a common field: `search1 | appendcols [search2 | fields common_field, new_field]`. `lookup` enriches events from an external source (CSV, KV store) based on a field match.
`stats` aggregates data and returns only the aggregated results (one row per group). `eventstats` calculates aggregations but adds the result as a new field to each original event. For example, `| eventstats count by src_ip` adds a field `count` to each event showing the total count for that src_ip. This is useful for comparing individual events to the aggregate (e.g., flagging events with count > 10).
Save the query as a saved search in Splunk. In the search bar, click 'Save As' > 'Report' or 'Alert'. Set the schedule (e.g., every 10 minutes). For alerts, define a condition (e.g., number of results > 0) and action (e.g., send email, trigger webhook). The exam may ask about the steps to schedule a search or the difference between a report and an alert.
You've just covered Splunk SPL Queries for Security Analysts — now see how well it sticks with free CS0-003 practice questions. Full explanations included, no account needed.
Done with this chapter?