CCNA Advanced Searching and Statistics Questions

75 of 150 questions · Page 1/2 · Advanced Searching and Statistics · Answers revealed

1
MCQhard

A Splunk environment ingests 10 TB per day. A user runs a search to count events per sourcetype over the last 7 days: `index=* earliest=-7d | timechart count by sourcetype`. The search returns partial results and eventually times out. The user needs to obtain the complete results efficiently. What is the best course of action?

A.Use `| bucket span=1d | stats count by _time sourcetype` then `| xyseries` to format.
B.Use `| sitime` to sample the data and approximate counts.
C.Use `| tstats count where index=* earliest=-7d by _time span=1d, sourcetype` and then format as needed.
D.Break the search into 1-day intervals and use `append` to combine results.
AnswerC

tstats leverages acceleration and is faster for large data volumes.

Why this answer

Option C is correct because `tstats` runs on indexed metadata (tsidx files) rather than raw events, making it far more efficient for counting events over large time ranges. By specifying `by _time span=1d, sourcetype`, you get daily counts per sourcetype without scanning the entire event data, avoiding the timeout that occurs with a raw search over 10 TB/day for 7 days.

Exam trap

Splunk often tests the distinction between raw event searches and metadata-based searches, and the trap here is that candidates may not realize `tstats` can aggregate by sourcetype and time span without touching raw data, leading them to choose inefficient raw-search options like A or D.

How to eliminate wrong answers

Option A is wrong because `bucket span=1d | stats count by _time sourcetype` still requires scanning all raw events from the index, which is inefficient and will likely time out on 70 TB of data. Option B is wrong because `sitime` is not a valid Splunk command; it appears to be a distractor, and sampling would not provide complete results as required. Option D is wrong because breaking the search into 1-day intervals and using `append` still requires scanning raw events for each interval, leading to the same performance issues and potential timeout, plus it adds overhead from multiple searches.

2
MCQhard

A search includes a subsearch that returns 100,000 results, causing performance issues. Which optimization is best?

A.Use limit in the subsearch to return fewer results
B.Use the fields command inside the subsearch
C.Use the format command inside the subsearch
D.Use the search command with index=* inside the subsearch
AnswerA

limit reduces the number of results, improving performance.

Why this answer

Option A is correct because using the `limit` command in a subsearch restricts the number of results returned to the primary search, directly reducing the data volume that must be processed and joined. This is the most effective optimization when a subsearch returns a large result set (e.g., 100,000 events), as it minimizes memory and CPU overhead in the search head.

Exam trap

The trap here is that candidates often confuse reducing field count (fields command) with reducing row count, or think that formatting (format) or widening the search (index=*) will somehow improve performance, when only limiting the actual number of results addresses the root cause.

How to eliminate wrong answers

Option B is wrong because the `fields` command only selects a subset of fields from the results, but does not reduce the number of events returned; the subsearch still returns 100,000 results, so performance issues persist. Option C is wrong because the `format` command changes the output format of the subsearch results (e.g., into a boolean expression), but does not reduce the result count; it is used for formatting, not optimization. Option D is wrong because using `search index=*` inside the subsearch would search all indexes, likely returning even more results and worsening performance; it does nothing to limit the result set size.

3
Matchingmedium

Match each Splunk search command to its primary function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Calculates aggregate statistics on search results

Extracts fields using regular expressions

Creates or modifies fields using expressions

Groups events into transactions based on common fields

Enriches events with external data from a lookup table

Why these pairings

These are common Splunk search commands used for data manipulation and enrichment.

4
MCQhard

A Splunk search uses a subsearch to find the top 10 client IPs and then retrieve all events from those IPs. The subsearch is: `index=web sourcetype=access | search [ top clientip | fields clientip ]` What does this search return?

A.The top 10 client IPs in a table.
B.Only the top 10 events based on some field.
C.All events where the client IP appears more than once.
D.All events from the top 10 most common client IPs.
AnswerD

The subsearch finds the top 10 client IPs, then outer search filters events matching those IPs.

Why this answer

The subsearch `[ top clientip | fields clientip ]` returns the top 10 most common client IPs as a list of values. The outer search then uses this list as a filter, effectively running `index=web sourcetype=access clientip=<ip1> OR clientip=<ip2> ...`. This retrieves all events from those IPs, not just the top 10 events.

Option D correctly describes this behavior.

Exam trap

The trap here is that candidates confuse the output of the subsearch (a table of IPs) with the final output of the entire search, failing to recognize that the outer search returns all matching events, not just the top IPs.

How to eliminate wrong answers

Option A is wrong because the outer search does not use `top` or `stats` to produce a table of IPs; it returns raw events from the index. Option B is wrong because the subsearch identifies the top 10 client IPs by count, not the top 10 events by any field; the outer search returns all events matching those IPs, not a limited set of events. Option C is wrong because the subsearch uses `top` to find the most common IPs, not to filter IPs that appear more than once; an IP appearing exactly once could still be in the top 10 if few unique IPs exist.

5
MCQeasy

An analyst wants to see the count of distinct users for each department over the last week. The data contains fields: user, department, date. Which search is correct?

A.... | stats distinct_count(user) by department
B.... | stats dc(user) by department
C.... | eval distinct_count=dc(user) | stats sum(distinct_count) by department
D.... | stats count(user) by department
AnswerB

dc() calculates distinct count.

Why this answer

Option B is correct because the `dc()` function in Splunk's `stats` command calculates the distinct count of values in a field, which is exactly what the analyst needs: the count of distinct users per department over the last week. The `by department` clause groups the results by department, and the implicit time range (last week) is applied via the search time picker or an explicit time filter in the query.

Exam trap

The trap here is that candidates often confuse `count()` (total events) with `dc()` (distinct count) or try to use `distinct_count` as a function name, which is a common Splunk syntax mistake tested in the SPLK-1003 exam.

How to eliminate wrong answers

Option A is wrong because `distinct_count(user)` is not a valid Splunk function; the correct function is `dc(user)` for distinct count. Option C is wrong because `eval distinct_count=dc(user)` is invalid — `dc()` cannot be used in an `eval` command; it is a statistical function only available in `stats` or similar transforming commands, and the subsequent `stats sum(distinct_count)` would not produce the correct distinct count per department. Option D is wrong because `count(user)` counts the total number of events where the user field exists, not the number of distinct users, which does not meet the requirement for distinct users per department.

6
MCQeasy

Which command creates a time-based chart showing a count of events over time?

A.| timecount
B.| timechart count by _time
C.| chart count over _time
D.| timechart count
AnswerB

This explicitly uses _time as the x-axis, creating a time-based chart of counts.

Why this answer

Option B is correct because the `timechart count` command in Splunk automatically bins events into time-based buckets and produces a time-series chart. The `by _time` clause is redundant but not incorrect, as `timechart` inherently uses `_time` as the x-axis; however, the canonical form is `timechart count` without the `by _time` clause. This command aggregates the count of events per time span and displays the result as a column or line chart over time.

Exam trap

Splunk often tests the distinction between `chart` and `timechart`; the trap here is that candidates confuse `chart` with `timechart` and think `chart count over _time` is valid, or they assume `timecount` is a real command, when in fact only `timechart` automatically handles time-based binning.

How to eliminate wrong answers

Option A is wrong because `timecount` is not a valid Splunk command; it is a common misspelling or confusion with `timechart`. Option C is wrong because `chart count over _time` uses incorrect syntax; the `chart` command does not support the `over` keyword for time-based binning and requires a `by` clause for splitting, and it does not automatically create time buckets. Option D is wrong because `timechart count` is actually a valid command that creates a time-based chart, but the question specifically asks for the command that creates a time-based chart showing a count of events over time, and option D is missing the `by _time` clause; however, the correct answer is B because it explicitly includes `by _time`, which is the standard way to ensure the x-axis is time, even though `timechart count` alone would also work.

The trap is that candidates might think D is correct because it is shorter, but the exam expects the explicit `by _time` syntax.

7
Drag & Dropmedium

Arrange the steps to create a knowledge object of type 'Event Type' in Splunk.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Event types are created by defining a search string that matches events, then saving with a name.

8
MCQhard

Refer to the exhibit. What does the pct field represent?

A.The running total percentage of events over time.
B.The percentage of each status across the entire time range.
C.The percentage of each status within each one-hour time bucket.
D.The percentage of events for that status compared to the maximum count in that hour.
AnswerC

Correct: eventstats sums by _time, so pct is per hour per status.

Why this answer

The `pct` field in the context of a time-based chart (e.g., `timechart count by status`) represents the percentage of each status value within each one-hour time bucket. This is calculated by dividing the count of a specific status in that bucket by the total count of all statuses in the same bucket, then multiplying by 100. Option C correctly identifies this per-bucket proportional breakdown.

Exam trap

The trap here is that candidates confuse `pct` (per-bucket percentage) with a global percentage across the entire time range (Option B) or with a running total (Option A), because they overlook that `timechart` inherently groups data into time buckets and calculates percentages within each bucket, not over the whole search span.

How to eliminate wrong answers

Option A is wrong because the `pct` field does not represent a running total percentage over time; that would require a cumulative or moving-window calculation (e.g., `accum` or `streamstats`), not a per-bucket ratio. Option B is wrong because the percentage is not calculated across the entire time range; that would be a single overall percentage per status (e.g., using `top` or `stats` without a time split), not a per-hour breakdown. Option D is wrong because the `pct` field is not relative to the maximum count in that hour; it is relative to the sum of all status counts in that hour, not the peak value.

9
Multi-Selectmedium

Which TWO of the following commands can be used to create a table of unique values for a field, along with their counts?

Select 2 answers
A.stats count by field_name
B.fields field_name
C.rare field_name
D.top field_name
E.dedup field_name
AnswersA, D

Returns all unique field values with their counts.

Why this answer

The `stats count by field_name` command groups events by the unique values of the specified field and outputs a table with each value and its count. The `top field_name` command also produces a table of the most frequent field values along with their counts, sorted in descending order by count. Both commands generate the required table of unique values with counts.

Exam trap

Splunk often tests the distinction between commands that produce counts of all unique values (`stats count by`, `top` with `limit=0`) versus commands that only show a subset (`top` default, `rare`) or do not count at all (`fields`, `dedup`).

10
MCQeasy

A user wants to find the top 5 sourcetypes by event count over the last 24 hours. Which search is correct?

A.index=* | eventcount | top sourcetype
B.index=* | stats count by sourcetype | top 5 sourcetype
C.index=* | stats count by sourcetype | sort -count | head 5
D.index=* | top sourcetype
AnswerC

Correctly counts, sorts descending, and limits to 5.

Why this answer

Option B is correct because it uses stats to count events by sourcetype, sorts descending, and returns the top 5. Option A uses top but defaults to 10 results and does not specify a time range explicitly. Option C has invalid syntax (top 5 sourcetype).

Option D uses eventcount which is not a valid command.

11
Multi-Selecthard

Which THREE of the following are valid ways to extract a substring from a field named "full_name" that contains "Firstname Lastname" into separate fields?

Select 3 answers
A.extract field=full_name first=1 last=2
B.eval first=split(full_name," ")[0], last=split(full_name," ")[1]
C.rex field=full_name "^(?<first>\w+)\s+(?<last>\w+)$"
D.makemv delim=" " full_name | eval first=mvindex(full_name,0), last=mvindex(full_name,1)
E.regex field=full_name "(?<first>[^ ]+) (?<last>[^ ]+)"
AnswersB, C, D

Splits the field into an array and indexes elements.

Why this answer

Option B is correct because the `split` function in Splunk's `eval` command returns a multivalue field from a string based on a delimiter, and array indexing with `[0]` and `[1]` extracts the first and second elements respectively. This directly splits "Firstname Lastname" into two separate fields named `first` and `last`.

Exam trap

Splunk often tests the distinction between commands that extract fields (like `rex` and `eval` with `split`) versus commands that only filter or transform data without creating new fields (like `regex` and `extract` with incorrect syntax).

12
MCQmedium

A user needs a report showing the number of distinct source IPs per sourcetype over the last hour. They run: `index=* earliest=-1h | stats dc(src_ip) by sourcetype`. The search runs slowly (2 minutes) and they want to speed it up. Which optimization is most effective?

A.Use `| top limit=100 sourcetype` to get top sourcetypes.
B.Use `| stats count by sourcetype, src_ip | stats count by sourcetype`.
C.Use `| chart count over sourcetype by src_ip`.
D.Use `| tstats dc(src_ip) where index=* earliest=-1h by sourcetype`.
AnswerD

tstats leverages summary data for faster retrieval.

Why this answer

tstats on an accelerated data model is much faster than scanning raw data. Option B uses nested stats which still scans raw data. Option C does not produce the desired result.

Option D gives top sourcetypes, not distinct IPs.

13
MCQmedium

A search returns raw events with a field 'response_time'. The analyst wants to calculate the average response time excluding any outliers that are more than 3 standard deviations from the mean. Which SPL approach is most efficient?

A.Use | eventstats avg, stdev(response_time) then | where response_time<=avg+3*stdev and response_time>=avg-3*stdev then | stats avg(response_time)
B.Use | top response_time
C.Use | stats avg(response_time) and then filter with where
D.Use | outlier action=remove
AnswerA

Efficient one-pass calculation with filtering

Why this answer

Option A is correct because it uses `eventstats` to compute the global average and standard deviation of `response_time` across all events, then filters out outliers (values more than 3 standard deviations from the mean) with a `where` clause, and finally calculates the clean average with `stats avg(response_time)`. This approach is efficient because `eventstats` adds the aggregate values to each event without reducing the dataset, allowing a single pass through the data for filtering and aggregation.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats`, where candidates mistakenly use `stats` first and then try to filter, not realizing that `stats` collapses events and loses the ability to apply per-event conditions.

How to eliminate wrong answers

Option B is wrong because `top` returns the most frequent values of a field, not statistical measures like average or standard deviation, and does not address outlier removal. Option C is wrong because using `stats avg(response_time)` first collapses the data into a single value, making it impossible to filter individual events by standard deviation; the `where` clause would have no events to filter. Option D is wrong because `outlier action=remove` is not a valid SPL command; the correct command is `outlier` with `action=remove`, but it uses median and IQR by default, not mean and standard deviation, and may not match the requirement of excluding values more than 3 standard deviations from the mean.

14
MCQhard

An analyst needs to identify events where the field `response_time` is more than 2 standard deviations above the average response_time for the same `host`. Which approach should be used?

A.Use `eventstats avg(response_time) as avg, stdev(response_time) as stdev` then `where response_time > avg+2*stdev`
B.Use `streamstats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`
C.Use `stats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`
D.Use `eventstats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`
AnswerD

eventstats adds per-host avg and stdev to each event, allowing the comparison.

Why this answer

Option D is correct because `eventstats` with a `by host` clause computes the average and standard deviation of `response_time` for each host across the entire result set, then appends those statistics to every event. This allows the subsequent `where` clause to compare each event's `response_time` against the host-specific threshold `avg+2*stdev`, correctly identifying outliers relative to the same host's distribution.

Exam trap

Splunk often tests the distinction between `eventstats` (which adds aggregate values to each event) and `stats` (which collapses events into a summary), and between `eventstats` and `streamstats` (which computes running vs. global statistics), to see if candidates understand which command preserves raw events for per-event comparisons.

How to eliminate wrong answers

Option A is wrong because `eventstats` without a `by` clause computes global statistics across all hosts, not per host, so the threshold would be based on the overall average and standard deviation, not the host-specific values required. Option B is wrong because `streamstats` computes running (cumulative) statistics over the event stream, not over the entire dataset; this would cause the average and standard deviation to change with each event, producing incorrect thresholds that depend on event order. Option C is wrong because `stats` aggregates the data into a summary table with one row per host, discarding the original events; the `where` clause would then have no individual `response_time` to compare against the threshold.

15
MCQmedium

A security analyst needs to correlate login events with subsequent actions from the same user within 30 minutes. They need to ensure that only one login per user session is considered, and actions after login are attached. Which command is most appropriate?

A.stats values(user) by _time
B.transaction user maxspan=30m
C.append [search action] | sort user _time
D.join user [search login] | timechart
AnswerB

Groups events by user within a time window.

Why this answer

Option A is correct: 'transaction user maxspan=30m' groups all events from the same user within 30 minutes into a single transaction. Option B does not group; C and D are inefficient for this correlation.

16
MCQeasy

Which command is used to convert a multi-value field into individual events?

A.mvexpand
B.eval split
C.makemv
D.fields
AnswerA

Correctly expands multi-value fields into separate events.

Why this answer

Option A is correct because mvexpand expands multi-value fields into separate events. Option B (makemv) creates multi-value fields. Option C (eval split) also creates multi-value fields.

Option D (fields) selects fields.

17
MCQhard

A search returns events with fields 'user', 'duration', and 'status'. The analyst wants to find users whose average duration exceeds 100 and who have more than 5 events. Which search is correct?

A.`... | where avg(duration)>100 | stats count by user | where count>5`
B.`... | top user limit=0 | where avg(duration)>100`
C.`... | stats avg(duration) as avg_dur, count as cnt by user | where avg_dur>100 and cnt>5`
D.`... | eventstats avg(duration) as avg_dur, count as cnt by user | where avg_dur>100 and cnt>5`
E.`... | stats avg(duration) as avg_dur, count as cnt by user | having avg_dur>100 and cnt>5`
AnswerC

Correct: stats reduces to one row per user, then where filters.

Why this answer

Option C is correct because it uses the `stats` command to compute the average duration and count per user in a single pass, then filters with `where` to enforce both conditions: average duration > 100 and event count > 5. This is the standard pattern for per-user aggregation followed by post-aggregation filtering in Splunk.

Exam trap

Splunk often tests the difference between `stats` and `eventstats`, where candidates mistakenly choose `eventstats` thinking it filters users, but it actually keeps all events and applies the `where` condition per event, not per user.

How to eliminate wrong answers

Option A is wrong because `where avg(duration)>100` is applied before any grouping, which attempts to filter on an aggregate without a `by` clause, causing an error or incorrect results. Option B is wrong because `top user limit=0` returns the most frequent users but does not compute average duration or allow filtering on it. Option D is wrong because `eventstats` adds aggregate values to each event without collapsing rows, so the `where` clause would evaluate per event, not per user, and the count would be the total events per user repeated on each row, not a distinct user-level filter.

Option E is wrong because `having` is not a valid Splunk command; it is a SQL clause not supported in SPL.

18
Multi-Selectmedium

A user needs to find events where a user had a failed login followed by a successful login within 10 minutes, and then list the total number of such occurrences per user. Which THREE steps are necessary? (Select three.)

Select 3 answers
A.Use the eval command to set a field for failure status
B.Use the stats command to count by user
C.Use the where command to filter transactions with both failure and success
D.Use the transaction command with maxspan=10m
E.Use the transaction command with startswith and endswith
AnswersB, D, E

Aggregates transaction counts per user.

Why this answer

Options A, B, and C are correct. The transaction command (A) with maxspan=10m groups events, and startswith/endswith (B) define the transaction boundaries. Then stats (C) counts the transactions per user.

Option D is not needed because transaction ensures the pattern. Option E is not necessary as fields exist.

19
Multi-Selecthard

Which THREE of the following are valid ways to count the number of events per minute for a given sourcetype?

Select 3 answers
A.index=main sourcetype=web | stats count by date_minute
B.index=main sourcetype=web | streamstats count window=1m | where count>0
C.index=main sourcetype=web | eval minute = strftime(_time, "%Y-%m-%d %H:%M") | stats count by minute
D.index=main sourcetype=web | bucket _time span=1m | stats count by _time
E.index=main sourcetype=web | timechart count span=1m
AnswersC, D, E

Creates a unique minute string and groups by it.

Why this answer

Options A, B, and D are correct. A uses bucket to group by minute then stats count. B uses timechart with span=1m.

D creates a minute-level string and groups by it. C uses date_minute which only captures the minute portion, not the full timestamp. E uses streamstats for a running count, not a per-minute count.

20
MCQhard

A search uses 'transaction' to group events by session, but the results show too many transactions with only one event. What is the best way to filter out single-event transactions?

A.| transaction ... | where eventcount > 1
B.Add maxspan=5m to the transaction command
C.| transaction maxevents=2 ...
D.| transaction ... | where eventcount=2
AnswerA

eventcount is a default field added by transaction; filtering >1 removes single-event transactions.

Why this answer

Option A is correct because the `transaction` command groups events into transactions, and appending `| where eventcount > 1` filters out any transaction that consists of only a single event. This directly addresses the requirement to remove single-event transactions, as `eventcount` is a default field added by `transaction` that counts the number of events in each transaction.

Exam trap

Splunk often tests the distinction between filtering after `transaction` versus using parameters like `maxspan` or `maxevents`, where candidates mistakenly think time or count limits inherently exclude single-event transactions, but those parameters only constrain grouping, not post-group filtering.

How to eliminate wrong answers

Option B is wrong because `maxspan=5m` limits the maximum time span of a transaction but does not filter out single-event transactions; a single event can still occur within a 5-minute window. Option C is wrong because `maxevents=2` caps the maximum number of events in a transaction at 2, but it does not exclude transactions with exactly 1 event; it only prevents more than 2 events. Option D is wrong because `where eventcount=2` would keep only transactions with exactly 2 events, not all transactions with more than 1 event, thus incorrectly discarding transactions with 3 or more events.

21
MCQeasy

Which command adds the overall average of a field to each event in the results?

A.streamstats avg(latency) as avg_latency
B.timechart avg(latency) as avg_latency
C.stats avg(latency) as avg_latency
D.eventstats avg(latency) as avg_latency
AnswerD

`eventstats` adds the average as a new field to each event.

Why this answer

The `eventstats` command computes aggregate statistics (like `avg(latency)`) over the entire result set and adds the result as a new field to every event, preserving all original events. This matches the requirement to add the overall average to each event. In contrast, `stats` collapses events into a single summary row, `streamstats` computes a running average per event, and `timechart` produces a time-based chart, none of which add the overall average to every original event.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats` — the trap here is that candidates confuse `stats` (which collapses events) with `eventstats` (which adds the aggregate to each event), leading them to incorrectly choose `stats` because they think it computes the average without realizing it removes the original events.

How to eliminate wrong answers

Option A is wrong because `streamstats avg(latency) as avg_latency` computes a running (cumulative) average over the events in order, not the overall average of the entire field, and it adds a per-event running value, not the single global average. Option B is wrong because `timechart avg(latency) as avg_latency` groups events by time buckets and returns a time-series chart with one average per bucket, discarding the original events and not adding a field to each event. Option C is wrong because `stats avg(latency) as avg_latency` aggregates all events into a single summary row containing only the average value, removing all original events and fields.

22
Multi-Selecthard

Which THREE of the following are correct characteristics of the transaction command? (Choose three.)

Select 3 answers
A.It groups related events based on common field values.
B.It can group events from different indexes.
C.It can use maxspan to set the maximum total duration of a transaction.
D.It can use maxpause to set the maximum time between events in a transaction.
E.By default, it retains all original fields from all events in the transaction.
AnswersA, C, D

Transaction groups events that share a common field, like session ID.

Why this answer

Option A is correct because the transaction command groups related events that share common field values, such as a session ID or user ID, to form a single transaction. This is a core function of the command, allowing you to correlate events across a dataset based on matching field content.

Exam trap

The trap here is that candidates often assume the transaction command can merge events across indexes or that it preserves all fields by default, but Splunk's transaction command is index-scoped and field-retention is minimal without explicit configuration.

23
Multi-Selecthard

Which THREE of the following commands can produce a time-based chart (timechart or chart with time buckets)? (Choose three.)

Select 3 answers
A.`chart count over _time bins=24`
B.`stats count by _time span=1h`
C.`timechart span=1h count`
D.`top _time`
E.`chart count by _time span=1d`
AnswersA, C, E

Correct: chart with bins over _time creates a time-based chart.

Why this answer

Option A is correct because the `chart` command with `over _time bins=24` explicitly creates a time-based chart by splitting the time range into 24 equal bins, each representing a time bucket, and then counts events per bucket. This produces a chart that can be visualized over time, similar to a timechart.

Exam trap

Splunk often tests the distinction between `stats` and `chart`/`timechart`, where candidates mistakenly think `stats count by _time` can produce a time-based chart, but `stats` only returns tabular data and does not support time-based charting without the `chart` or `timechart` command.

24
MCQeasy

An analyst wants to create a time series chart showing the count of errors per hour over the last 24 hours. The errors are logged with sourcetype=error_log. Which search achieves this?

A.index=main sourcetype=error_log | chart count over _time by hour
B.index=main sourcetype=error_log | bin _time span=1h | stats count by _time
C.index=main sourcetype=error_log | chart count by _time
D.index=main sourcetype=error_log | timechart count span=1h
AnswerD

Correctly produces hourly count time chart.

Why this answer

Option D is correct because `timechart count span=1h` automatically creates a time series chart with one-hour buckets over the last 24 hours, grouping events by `_time` and counting them per bucket. The `timechart` command is specifically designed for time-based aggregation and produces a chart with `_time` on the x-axis, which is exactly what the analyst needs.

Exam trap

The trap here is that candidates often confuse `chart` with `timechart`, thinking `chart count by _time` will produce a time series, but `chart` treats `_time` as a categorical field rather than a continuous time axis, leading to incorrect visualizations.

How to eliminate wrong answers

Option A is wrong because `chart count over _time by hour` is invalid syntax; `chart` does not support `over` and `by` in that order, and it would not bin events into hourly buckets. Option B is wrong because `bin _time span=1h | stats count by _time` produces a table, not a time series chart, and the `bin` command modifies `_time` but `stats` does not automatically generate a chart. Option C is wrong because `chart count by _time` creates a chart with each unique `_time` value as a separate column, not a time series with hourly aggregation, and it would produce too many data points for a 24-hour period.

25
Multi-Selecteasy

Which THREE of the following are valid Splunk search commands for determining the number of distinct values of a field?

Select 3 answers
A.| stats count(field)
B.| stats dc(field)
C.| stats distinct_count(field)
D.| stats values(field)
E.| dedup field | stats count
AnswersB, D, E

dc() is a stats function that returns distinct count.

Why this answer

Option B is correct because `| stats dc(field)` uses the `dc` (distinct count) function to return the exact number of unique values for the specified field. This is the direct and most efficient command for counting distinct values in Splunk.

Exam trap

The trap here is that candidates often confuse `count(field)` with `dc(field)`, not realizing that `count` tallies events while `dc` tallies unique values, and that `distinct_count` is not a valid Splunk command.

26
MCQmedium

A Splunk admin notices that a search using the transaction command takes too long. To debug, they want to see how events are grouped into transactions before the transaction command runs. Which command can be added to the search pipeline before transaction to inspect the grouping?

A.map
B.fields
C.transaction
D.streamstats
AnswerD

streamstats can create a session ID or other grouping field, enabling preview of transaction grouping.

Why this answer

Option B is correct because streamstats can be used to compute a transaction ID based on a time window or field change, allowing you to see how events would be grouped. Options A, C, and D do not help inspect grouping.

27
Multi-Selectmedium

Which TWO search commands can be used to calculate a running total (cumulative sum) of a field over time?

Select 2 answers
A.delta
B.accum
C.transaction
D.streamstats
E.eventstats
AnswersB, D

Built-in command for cumulative sum.

Why this answer

B is correct because the `accum` command calculates a cumulative sum of a specified numeric field across all events in the search result order, adding each event's value to the running total. This directly implements a running total without requiring any additional options.

Exam trap

Splunk often tests the distinction between `accum` and `eventstats`, where candidates mistakenly choose `eventstats` thinking it computes running totals, but it actually computes a single aggregate over all events and appends that same value to every event.

28
MCQeasy

An analyst wants to find the top 5 users who have the highest total bytes transferred. The data has fields 'user' and 'bytes'. Which search should be used?

A.| stats max(bytes) as max_bytes by user | sort - max_bytes | head 5
B.| stats sum(bytes) as total_bytes by user | sort - total_bytes | head 5
C.| sort - bytes | head 5 | table user, bytes
D.| top limit=5 user
AnswerB

This correctly sums bytes per user, sorts descending, and takes top 5.

Why this answer

Option B is correct because it uses `stats sum(bytes) as total_bytes by user` to aggregate the total bytes transferred per user, then sorts the results in descending order with `sort - total_bytes`, and finally limits the output to the top 5 users with `head 5`. This directly answers the requirement for the highest total bytes transferred.

Exam trap

The trap here is that candidates often confuse `max` with `sum` for total calculations, or mistakenly think sorting raw events and taking the top 5 yields user-level totals, when in fact aggregation by user is required first.

How to eliminate wrong answers

Option A is wrong because `stats max(bytes)` returns the single largest byte value per user, not the sum of all bytes transferred, so it does not calculate total bytes. Option C is wrong because it sorts individual events by bytes and takes the first 5, which gives the 5 events with the highest bytes, not the top 5 users by total bytes; it also does not aggregate per user. Option D is wrong because `top limit=5 user` counts the frequency of user occurrences, not the sum of bytes transferred, so it answers a different question.

29
Multi-Selectmedium

Which TWO of the following statements about the 'transaction' command are true? (Choose two.)

Select 2 answers
A.It can add a 'duration' field to the result events.
B.It requires events to be sorted by _time in descending order.
C.It can only be used with 'startswith' and 'endswith' options.
D.It requires at least one field in the 'by' clause.
E.It groups events that are logically related based on common field values and time proximity.
AnswersA, E

Correct: transaction automatically adds a duration field if enabled.

Why this answer

Option A is correct because the 'transaction' command automatically adds a 'duration' field to the result events, which represents the time difference between the first and last event in the transaction. This is a built-in behavior of the command, not an optional setting.

Exam trap

Splunk often tests the misconception that the 'transaction' command requires a 'by' clause or that it only works with 'startswith'/'endswith', when in fact it can group events purely by time proximity using 'maxspan' and 'maxpause'.

30
MCQhard

A search is used to calculate the 95th percentile of response times for each application, and then to find applications where the 95th percentile exceeds 5000 ms. The current search is: `index=perf sourcetype=app_response | stats perc95(response_time) by app | where perc95(response_time) > 5000` This search fails with an error. What is the most likely reason?

A.The stats command cannot use perc95 with a by clause.
B.The stats command requires a rename of the output field.
C.The field name in the where clause must match exactly, including parentheses. Use `where 'perc95(response_time)' > 5000`
D.The perc95 function is not a valid stats function.
AnswerC

In Splunk, the resulting field is named with the function and parentheses, so it must be quoted or escaped.

Why this answer

Option C is correct because the `where` clause in SPL treats field names containing special characters (like parentheses) as literal strings. Without quoting, `perc95(response_time)` is parsed as a function call rather than a field name, causing a syntax error. Wrapping the field name in single quotes (`'perc95(response_time)'`) tells Splunk to treat it as a literal field reference.

Exam trap

Splunk often tests the nuance that field names generated by `stats` with functions like `perc95()` must be quoted in subsequent commands like `where` or `eval` to avoid being misinterpreted as function calls.

How to eliminate wrong answers

Option A is wrong because `perc95` is a valid percentile function in the `stats` command and can be used with a `by` clause to compute percentiles per group. Option B is wrong because the `stats` command does not require renaming the output field; the field is automatically named `perc95(response_time)` and can be referenced directly. Option D is wrong because `perc95` is a valid stats function in Splunk, used to calculate the 95th percentile of a numeric field.

31
MCQhard

An administrator notices that a search using the timechart command returns data for every 15-minute bucket even when no events exist, creating many null values. How can this behavior be suppressed?

A.Use timechart limit=0
B.Use timechart usenull=f
C.Use timechart partial=false
D.Use timechart cont=false
AnswerD

Suppresses continuous time bins with no data.

Why this answer

Option D is correct: cont=false suppresses continuous time bins, showing only buckets with data. Option A (limit=0) affects number of series, not gaps. Option B (usenull=f) is not a valid timechart option.

Option C (partial=false) controls partial buckets at edges.

32
Matchingmedium

Match each Splunk report type to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Displays results in a tabular format

Visualizes data as a chart (e.g., bar, line, pie)

Shows statistical summaries like count, avg, sum

A collection of panels with visualizations

Triggers actions based on search results

Why these pairings

Reports help present and monitor data in Splunk.

33
MCQmedium

An analyst runs a search that includes a subsearch: `index=web [search index=web status=500 | fields url | dedup url | limit 5]`. The main search returns no results even though the subsearch returns 5 URLs. What is the most likely issue?

A.The `limit 5` should be inside the subsearch command, not after dedup.
B.The main search index should be different from the subsearch index.
C.The subsearch returns multiple values but the main search does not properly use them as a filter.
D.The subsearch should not use `fields url` because that causes loss of the search command.
AnswerC

The subsearch returns a list of URLs, but the main search must use the result in a way that matches the field; typically one would use `[search ... | table url | format]` to ensure correct formatting.

Why this answer

The subsearch returns a list of 5 URLs, but the main search `index=web [subsearch]` uses these results as a filter. By default, a subsearch returns its results as a single string (or multiple values) that are passed to the main search's `search` command. If the subsearch returns multiple values (e.g., `/page1 /page2 ...`), the main search interprets them as a single search string unless properly formatted with quotes or an `OR` operator.

Since the subsearch does not use `format` or `return` to structure the output, the main search likely receives an invalid or empty filter, yielding no results.

Exam trap

The trap here is that candidates assume the subsearch results are automatically used as a proper filter, but Splunk requires explicit formatting (e.g., `return` or `format`) to ensure multiple values are correctly combined with OR logic in the main search.

How to eliminate wrong answers

Option A is wrong because `limit 5` after `dedup` is valid; it limits the deduplicated results to 5, and the subsearch correctly returns 5 URLs. Option B is wrong because the main search and subsearch can use the same index; there is no requirement for different indexes, and using the same index is common for filtering. Option D is wrong because `fields url` is a valid command that retains only the `url` field, which is necessary for the subsearch to return URL values; it does not cause loss of the search command.

34
Multi-Selectmedium

Which TWO of the following statements about the 'stats' command are true?

Select 2 answers
A.It can be used with a BY clause to group results.
B.The count() function must always include a field argument.
C.It creates one event per input event by default.
D.It can be used to modify individual field values in raw events.
E.It can produce multiple output columns by using multiple stats functions.
AnswersA, E

The BY clause allows grouping by one or more fields.

Why this answer

Option A is correct because the 'stats' command in Splunk supports a BY clause that allows you to group results by one or more fields, similar to a SQL GROUP BY. This enables you to compute aggregate statistics (e.g., count, sum, avg) for each distinct value of the specified field(s), which is a core feature for summarizing event data.

Exam trap

Splunk often tests the misconception that 'stats' works like 'eval' or 'rex' to modify raw events, or that count() always requires a field, when in fact count() without a field is a valid and common usage.

35
MCQmedium

A security analyst needs to find all events where the field 'user' has a value that is either 'admin' or 'root', but the search is returning too many results from a noisy source. Which search best filters the events to only include those where the 'user' field exactly matches 'admin' or 'root'?

A.user="admin" OR user="root"
B.user=*admin* OR user=*root*
C.user IN ("admin", "root")
D.user=admin OR user=root
AnswerC

The IN operator matches fields exactly against the listed values, avoiding substring issues.

Why this answer

Option C is correct because the `IN` operator in Splunk's Search Processing Language (SPL) performs an exact match against a list of values, ensuring that only events where the `user` field is exactly 'admin' or 'root' are returned. This is the most precise and efficient way to filter for multiple exact values without introducing wildcard behavior or relying on implicit field-value parsing that may include surrounding whitespace or punctuation.

Exam trap

Splunk often tests the distinction between exact match operators (`=`, `IN`) and wildcard patterns (`*`), trapping candidates who assume that `user=admin` (unquoted) or `user="admin"` (quoted) will always perform an exact match, when in fact they can behave differently depending on the field's data type and the presence of special characters.

How to eliminate wrong answers

Option A is wrong because `user="admin" OR user="root"` uses quoted strings, which in Splunk can sometimes be interpreted as literal strings rather than field-value comparisons, potentially causing the search to match substrings or fail if the field contains extra characters. Option B is wrong because `user=*admin* OR user=*root*` uses wildcards (`*`), which match any value containing 'admin' or 'root' anywhere in the field, returning many irrelevant events (e.g., 'superadmin', 'rootuser'). Option D is wrong because `user=admin OR user=root` uses unquoted values, which Splunk may treat as search terms rather than exact field-value pairs, leading to unintended matches or parsing errors.

36
MCQhard

A large e-commerce company runs Splunk Enterprise on a single indexer cluster with four indexers. They have been experiencing slow search performance during peak hours, especially for searches that cover the last 24 hours. The environment uses a default search time range of 'Last 30 days'. The team has noticed that searches often time out or return partial results. They have also observed high CPU usage on the search head during peak times. The company's data volume is approximately 500 GB per day across various sources. They have implemented some search acceleration for data models, but the issue persists. The security team needs to run ad-hoc searches for threat hunting that cover multiple sourcetypes over the last 7 days. Additionally, the search head has a memory limit that is sometimes reached. The security team's searches are complex and involve joins and subsearches. The existing acceleration only covers a few data models. The team is looking for a quick win that does not require significant infrastructure changes. Which course of action would most effectively improve search performance without compromising data completeness?

A.Implement a data model for all sourcetypes and enforce using tstats for all searches
B.Reduce the default time range to 'Last 7 days' and encourage users to specify shorter time ranges
C.Increase the number of indexers to distribute the load
D.Use the search head clustering feature to distribute search load across multiple search heads
AnswerB

Immediately reduces data scanned for most searches, a quick win.

Why this answer

Option B is correct because reducing the default time range from 'Last 30 days' to 'Last 7 days' directly reduces the data scanned by searches, which is the most effective quick win without infrastructure changes. Since the environment has high CPU usage on the search head and searches often time out, limiting the default time range reduces the load on the indexers and search head, improving performance for the majority of searches. This change does not compromise data completeness because users can still specify longer time ranges when needed.

Exam trap

Splunk often tests the misconception that adding more hardware (indexers or search heads) is the only way to improve performance, when in fact optimizing search time ranges and using acceleration appropriately can provide a quicker and more cost-effective solution.

How to eliminate wrong answers

Option A is wrong because implementing a data model for all sourcetypes and enforcing tstats for all searches would require significant upfront effort and may not cover ad-hoc threat hunting searches that use joins and subsearches, which tstats cannot directly support. Option C is wrong because increasing the number of indexers requires significant infrastructure changes and is not a quick win; it also does not address the search head memory limit or the default time range issue. Option D is wrong because search head clustering distributes search load across multiple search heads, but it requires additional hardware and configuration, and does not reduce the amount of data scanned per search; it also does not address the root cause of high CPU usage on the existing search head.

37
MCQeasy

Refer to the exhibit. The search returns results quickly but shows zero events for some src values. What does `summariesonly=t` imply?

A.It restricts tstats to only use data from accelerated data model summaries.
B.It causes tstats to search the index directly.
C.It makes the search run faster but less accurate.
D.It forces tstats to return all events, including those not in summaries.
AnswerA

This is the purpose of summariesonly=t; if acceleration is not complete, some data may be missing.

Why this answer

Option A is correct because `summariesonly=t` in a `tstats` command restricts the search to only use data from accelerated data model summaries, ignoring raw event data. This is why the search returns quickly but shows zero events for some `src` values—those values are not present in the accelerated summaries, so they are excluded from the results.

Exam trap

Splunk often tests the misconception that `summariesonly=t` makes searches 'faster but less accurate,' when in reality it strictly limits the data source to summaries, and accuracy depends on summary completeness, not the option itself.

How to eliminate wrong answers

Option B is wrong because `summariesonly=t` does not cause `tstats` to search the index directly; instead, it explicitly avoids the index and relies solely on summary data. Option C is wrong because while `summariesonly=t` can make the search run faster, it does not inherently make it 'less accurate'—accuracy depends on whether the summaries are complete and up-to-date; the option itself does not introduce inaccuracy. Option D is wrong because `summariesonly=t` does the opposite—it forces `tstats` to return only events present in summaries, not all events including those not in summaries.

38
MCQhard

A search uses the transaction command with maxevents=1000 and maxspan=1h. The search is slow and memory-intensive. Which modification can reduce resource usage while still grouping related events?

A.Use eventstats with a time window.
B.Reduce maxevents to 100.
C.Use tstats instead of transaction.
D.Increase maxspan to 2h.
AnswerB

Reducing the maximum events per transaction lowers memory consumption.

Why this answer

Reducing maxevents to 100 limits the number of events that the transaction command groups into a single transaction, which directly reduces memory consumption and processing overhead. This modification still allows related events to be grouped together, but with a smaller batch size, making the search less resource-intensive while preserving the core grouping logic.

Exam trap

Splunk often tests the misconception that reducing maxevents will break event grouping, when in fact it only limits the number of events per transaction, still allowing related events to be grouped as long as they fall within the maxspan and other criteria.

How to eliminate wrong answers

Option A is wrong because eventstats with a time window does not group events into transactions; it only computes statistical aggregations over a sliding window and cannot correlate events into a single logical group. Option C is wrong because tstats is designed for statistical queries on indexed fields and cannot perform event grouping or transaction-style correlation across raw events. Option D is wrong because increasing maxspan to 2h would allow the transaction to span a longer time window, potentially including more events and increasing memory usage, which is the opposite of reducing resource usage.

39
MCQeasy

An analyst runs `index=web status=500 | top 10 uri` and gets results. Which statement is true about the 'top' command's behavior?

A.It returns the 10 URIs that appeared most recently.
B.It computes the average latency per URI and shows the top 10.
C.It sums a numeric field per URI and shows the top 10 sums.
D.It requires a 'by' clause to specify the field to group by.
E.It counts the number of events per URI and displays the 10 with the highest count.
AnswerE

Correct: top counts events per field value.

Why this answer

The `top` command in Splunk counts the occurrences of each distinct value of a specified field (here, `uri`) and returns the values with the highest counts. By default, it returns the top 10 results, so option E correctly describes that it counts events per URI and displays the 10 with the highest count.

Exam trap

The trap here is that candidates often confuse `top` with commands like `sort` or `stats`, thinking it sorts by time or computes averages, when in fact it purely counts event frequencies per field value.

How to eliminate wrong answers

Option A is wrong because `top` does not consider recency; it counts total occurrences, not the most recent events. Option B is wrong because `top` does not compute averages; it counts events, and latency is not involved. Option C is wrong because `top` does not sum numeric fields; it counts events per distinct value.

Option D is wrong because `top` does not require a `by` clause; it automatically groups by the field specified as its argument (e.g., `uri`).

40
MCQmedium

An analyst needs to calculate the average response time for each web server, but only for requests that returned status code 200. The field 'response_time' is numeric. Which search correctly achieves this?

A.index=web | transaction server maxspan=1m | stats avg(response_time) by server
B.index=web status=200 | eventstats avg(response_time) as avg_time by server
C.index=web | eval avg_time = avg(response_time) by server | search status=200
D.index=web status=200 | stats avg(response_time) by server
AnswerD

Correct: filter first, then aggregate.

Why this answer

Option D is correct because it first filters the data to only include events with status=200 using a search-time field filter, then uses the `stats` command with `avg(response_time) by server` to compute the average response time per server. This ensures that only successful requests are included in the aggregation, and the `by server` clause correctly groups the results by each web server.

Exam trap

The trap here is that candidates often confuse `eventstats` with `stats`, thinking `eventstats` can produce a summary table, or they incorrectly use `eval` with aggregation functions, not realizing that `eval` operates on individual events and cannot perform group-by calculations.

How to eliminate wrong answers

Option A is wrong because it uses the `transaction` command with a `maxspan=1m`, which groups events into transactions based on time proximity rather than filtering by status=200, and the `avg(response_time)` would be computed over all events in the transaction, not just successful ones. Option B is wrong because `eventstats` adds the average response time as a new field to each event but does not aggregate the results into a single row per server; it retains all raw events, which does not produce the desired summary table. Option C is wrong because `eval` cannot perform an aggregation like `avg()` with a `by` clause; `eval` works on a per-event basis, and the `search status=200` after the `eval` would filter events before any aggregation, making the syntax invalid and the logic incorrect.

41
MCQhard

A search uses a subsearch to filter events, but the subsearch returns more than 50,000 results, causing the search to fail. Which approach can avoid this limit while still achieving the goal?

A.Increase the limit in limits.conf.
B.Use a join command instead.
C.Use a nested subsearch with stats to aggregate results first.
D.Use the format command with AND.
AnswerC

Aggregating reduces the number of results returned by the subsearch.

Why this answer

Option C is correct because using a nested subsearch with `stats` to aggregate results first reduces the number of events returned by the subsearch, allowing it to stay under the default 50,000-result limit. This approach pre-processes the subsearch output (e.g., by counting or grouping) so that the outer search receives a manageable set of values, effectively bypassing the limit without altering system configuration.

Exam trap

Splunk often tests the misconception that increasing configuration limits or using commands like `join` or `format` can solve subsearch limit issues, when the correct approach is to reduce the subsearch output size through aggregation.

How to eliminate wrong answers

Option A is wrong because increasing the limit in `limits.conf` is a global configuration change that can degrade search performance and is not a best practice for handling subsearch result limits; it also requires a restart and affects all searches. Option B is wrong because the `join` command itself has a subsearch limit (default 50,000 results) and is generally inefficient, often leading to the same failure or poor performance; it does not inherently avoid the limit. Option D is wrong because the `format` command is used to format subsearch results into a boolean expression for the outer search, but it does not reduce the number of results returned by the subsearch; the subsearch still hits the 50,000-result limit before `format` is applied.

42
MCQeasy

A search returns many events, and the analyst wants to see a summary table of the top 5 values of the field `src_ip` along with the count of events for each. Which command should be used?

A.eventstats
B.top
C.sort
D.rare
AnswerB

top returns the most frequent values with count and percent.

Why this answer

The `top` command in Splunk is specifically designed to find the most common values of a field and display them in a summary table with counts and percentages. By default, `top` returns the top 10 values, but you can use the `limit=5` parameter to restrict the output to the top 5 values of `src_ip` along with their event counts. This directly meets the analyst's requirement.

Exam trap

The trap here is that candidates may confuse `top` with `sort` or `eventstats`, thinking that sorting or adding statistics to events is sufficient, when in fact only `top` provides the aggregated summary table of the most frequent values with counts.

How to eliminate wrong answers

Option A is wrong because `eventstats` computes aggregate statistics (like count, sum, avg) and adds them as new fields to each event, but it does not produce a summary table of top values. Option C is wrong because `sort` reorders events based on a field but does not aggregate counts or produce a summary table of top values. Option D is wrong because `rare` finds the least common values of a field, which is the opposite of what the analyst needs (top 5).

43
Multi-Selectmedium

Which TWO of the following are valid methods to join two sets of search results?

Select 2 answers
A.join
B.append
C.union
D.stats values(*)
E.addcoltotals
AnswersA, B

`join` merges results from two searches based on a field.

Why this answer

Option A is correct because the `join` command in Splunk merges two result sets based on a common field, similar to a SQL JOIN. It requires both datasets to have a matching field and supports inner, left, outer, and other join types. Option B is correct because the `append` command simply adds the results of a subsearch to the main search results, effectively concatenating the two sets without requiring a common field.

Exam trap

Splunk often tests the distinction between `join` and `append` versus non-existent or unrelated commands like `union` or `addcoltotals`, and candidates may confuse `stats values(*)` as a join method because it can combine values, but it does not join separate result sets.

44
Multi-Selecteasy

Which TWO commands can be used to perform statistical aggregations on streaming events without creating a separate search results set?

Select 2 answers
A.timechart
B.stats
C.streamstats
D.eventstats
E.chart
AnswersC, D

streamstats adds stats to each event as the results stream, preserving all events.

Why this answer

C is correct because streamstats performs statistical aggregations on streaming events in real-time as they arrive, without creating a separate search results set. It computes windowed statistics (e.g., running sum, moving average) on the event stream itself, appending the result to each event. D is correct because eventstats also operates on the current event set, computing aggregations and adding the results as new fields to each event without generating a separate output set.

Exam trap

Splunk often tests the distinction between commands that produce a new results set (stats, chart, timechart) versus those that augment the existing event stream (streamstats, eventstats), and the trap here is that candidates confuse eventstats with stats, thinking both create separate outputs.

45
Multi-Selectmedium

Which TWO of the following searches are syntactically valid uses of the eventstats command? (Assume all referenced fields exist.)

Select 2 answers
A.| eventstats perc95(response_time) by host | eval p95 = perc95
B.| eventstats sum(bytes) as total_bytes by src_ip | eval pct = bytes/total_bytes*100
C.| eventstats avg(response_time) as avg | eval diff = response_time - avg
D.| eventstats values(user) as users by session_id | eval num_users = len(users)
E.| eventstats max(_time)last_time by user | table last_time
AnswersB, C

Valid: eventstats adds the sum per src_ip, then eval calculates percentage.

Why this answer

Option B is correct because the `eventstats` command calculates aggregate statistics (here, `sum(bytes)`) over the entire result set or by group (here, `by src_ip`), and it adds the result as a new field (`total_bytes`) to every event. This allows the subsequent `eval` to compute a per-event percentage using that new field. Option C is correct because `eventstats` without a `by` clause computes the aggregate over all events and adds the result (here, `avg`) to each event, enabling the `eval` to calculate the difference from the global average.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats`, and the trap here is that candidates confuse the syntax for aliasing (missing `as`) or use invalid eval functions like `len()` instead of `mvcount()`, leading them to select options that look plausible but are syntactically incorrect.

46
MCQeasy

A user wants to create a timechart showing the number of distinct users per hour over the past week. Which search is correct?

A.index=web | bucket _time span=1h | stats dc(user) by _time
B.index=web | timechart dc(user) span=1h
C.index=web | timechart dc(user) by _time span=1h
D.index=web | timechart count by user span=1h
AnswerB

Correctly uses timechart with distinct count function.

Why this answer

Option B is correct because the `timechart` command automatically creates a time-based chart with a default or specified `span`, and `dc(user)` calculates the distinct count of users per time bucket. The `span=1h` argument sets the bucket size to one hour, and the `by` clause is not needed because `timechart` implicitly groups by `_time`. This produces the desired output of distinct users per hour over the past week.

Exam trap

Splunk often tests the distinction between `stats` and `timechart` — the trap here is that candidates mistakenly use `stats ... by _time` (Option A) thinking it produces a timechart, or misuse the `by` clause in `timechart` (Option C) by including `_time` as a split-by field, which breaks the time-based grouping.

How to eliminate wrong answers

Option A is wrong because `bucket _time span=1h` creates time buckets but does not automatically sort or chart the results, and `stats dc(user) by _time` would produce a table, not a timechart, and may not handle missing time buckets correctly. Option C is wrong because `timechart dc(user) by _time span=1h` incorrectly uses `by _time` as a split-by field, which would attempt to create a separate series for each unique timestamp value, not group by hour. Option D is wrong because `count by user` counts all events per user instead of distinct users, and `span=1h` is applied to `timechart` but the `by user` clause splits the chart by user, not by time, so it does not show distinct users per hour.

47
MCQmedium

An analyst wants to create a timechart of the count of events per hour, but only for events where the field `status` contains the word "fail" (case-insensitive). Which search is correct?

A.index=main | timechart count | search status=*fail*
B.index=main | regex status="fail" | timechart count
C.index=main | where status="*fail*" | timechart count
D.index=main | timechart count by eval(case(match(status,"(?i)fail"),1))
AnswerD

match with (?i) does case-insensitive regex and eval creates a field for timechart.

Why this answer

Option D is correct because it uses `eval` with `case` and `match` to create a field that is 1 when `status` contains 'fail' (case-insensitive via `(?i)`), then uses `timechart count by` that field to count only matching events per hour. This approach correctly filters within the timechart aggregation, ensuring only events where status matches the pattern are counted.

Exam trap

Splunk often tests the distinction between `search` (which supports wildcards) and `where` (which does not), and the requirement to use `match` with regex flags for case-insensitive substring matching in aggregation commands.

How to eliminate wrong answers

Option A is wrong because it runs `timechart count` on all events first, then applies `search status=*fail*` after the timechart, which would filter the timechart results (which no longer have individual event fields) rather than filtering events before counting. Option B is wrong because `regex status="fail"` is a case-sensitive exact match, not a case-insensitive substring search, and it would fail to match 'Failed' or 'FAIL'. Option C is wrong because `where status="*fail*"` treats the asterisks as literal characters, not wildcards; `where` does not support wildcard patterns like `search` does.

48
Multi-Selectmedium

Which TWO of the following commands are useful for reducing the number of events before a `stats` command to improve performance? (Choose 2)

Select 2 answers
A.head
B.transaction
C.sort
D.eval
E.fields
AnswersA, E

Limiting events with head reduces the number of events processed.

Why this answer

The `head` command limits the number of events processed by returning only the first N events from the search results. By reducing the event volume early in the pipeline, it significantly decreases the workload on the subsequent `stats` command, improving performance when only a sample or the most recent events are needed.

Exam trap

Splunk often tests the misconception that `sort` or `transaction` can reduce event volume, but candidates must remember that these commands either require full event sets or increase complexity, making `head` and `fields` the correct choices for performance optimization before aggregation.

49
MCQhard

Refer to the exhibit. The search returns 50 results after the `where` command. What is the purpose of the `eval` command?

A.To filter out results with count ≤ 100.
B.To modify the 'count' field.
C.To rename the 'count' field to 'severity'.
D.To create a new field 'severity' based on a condition.
AnswerD

`eval` with `if()` creates a new field 'severity' that is 'high' if count > 100, else 'low'.

Why this answer

The `eval` command in Splunk is used to create new fields or evaluate expressions. In this context, the `eval` command creates a new field called 'severity' by evaluating a conditional expression that assigns a value based on the 'count' field. This is confirmed by the search returning 50 results after the `where` command, meaning the `eval` command does not filter results but instead adds a computed field.

Exam trap

The trap here is that candidates often confuse `eval` with `where` or `rename`, thinking `eval` can filter or rename fields, when in fact `eval` only creates or modifies fields without affecting the result set or field names directly.

How to eliminate wrong answers

Option A is wrong because the `eval` command does not filter results; filtering is done by the `where` command, which already returned 50 results. Option B is wrong because the `eval` command does not modify the existing 'count' field; it creates a new field 'severity' without altering 'count'. Option C is wrong because the `eval` command does not rename fields; renaming is done using the `rename` command, and the syntax shown creates a new field, not a rename.

50
MCQeasy

Which command extracts a field named 'ip' from the raw event using a regex pattern?

A.rex "ip=(?\d+\.\d+\.\d+\.\d+)"
B.rex "ip=(?<ip>\d+\.\d+\.\d+\.\d+)"
C.rex field=_raw "ip=(?P<ip>\d+\.\d+\.\d+\.\d+)"
D.rex field=_raw "ip=(?<ip>\d+\.\d+\.\d+\.\d+)"
AnswerB

This uses a valid named group and defaults to _raw, correctly extracting the IP field.

Why this answer

Option B is correct because the `rex` command uses the named capture group syntax `(?<ip>...)` to extract a field named 'ip' from the raw event. The pattern `(?<ip>\d+\.\d+\.\d+\.\d+)` matches an IPv4 address and assigns it to the field 'ip'. By default, `rex` operates on the `_raw` field, so no explicit `field=_raw` is needed, and the syntax `(?<name>...)` is the correct Splunk named capture group syntax.

Exam trap

Splunk often tests the distinction between Splunk's `(?<name>...)` syntax and other regex flavors like Python's `(?P<name>...)` or invalid syntax like `(?name...)`, leading candidates to choose options with `?P` or missing angle brackets.

How to eliminate wrong answers

Option A is wrong because the capture group syntax `(?\d+...)` is invalid; Splunk requires a named capture group with angle brackets, like `(?<ip>...)`, to extract a field. Option C is wrong because it uses `(?P<ip>...)`, which is Python-compatible regex syntax, not Splunk's `(?<ip>...)` syntax; Splunk does not support `?P` for named groups. Option D is wrong because, although it correctly uses `(?<ip>...)` and specifies `field=_raw`, the `rex` command defaults to `_raw` anyway, so the explicit field specification is redundant but not incorrect; however, the question asks for the command that extracts the field, and D is technically valid but not the most concise or typical answer, and the exam expects the simpler form without `field=_raw`.

51
MCQmedium

Refer to the exhibit. What is the purpose of this search?

A.To compare two datasets and show only matching server names.
B.To update the lookup file with current status.
C.To find servers that are missing from the lookup.
D.To enrich the server list with current status from the main index.
AnswerD

The left join adds current_status to each server from the lookup.

Why this answer

The search uses `inputlookup` to load a lookup file (server_list), then pipes it into `eval` to create a `status` field set to 'missing'. The `append` command adds all events from the main index (source=main sourcetype=access_combined) that match the server names in the lookup. The `stats values(*) as * by server` merges the two datasets per server, so if a server from the lookup has matching events in the main index, its `status` field will be overwritten with the actual status from the main index (e.g., '200').

Servers with no matching events retain 'missing'. This enriches the lookup data with current status from the main index.

Exam trap

The trap here is that candidates may think `append` is used for comparison or filtering (like `join`), but it simply adds events, and the `stats` command with `values()` is what merges and enriches the data, not a direct comparison or update operation.

How to eliminate wrong answers

Option A is wrong because the search does not compare two datasets for matching server names; it uses `append` and `stats` to merge data, not to filter only matching names. Option B is wrong because the search does not include an `outputlookup` command to write results back to the lookup file; it only displays the enriched results. Option C is wrong because the search starts with all servers from the lookup and then adds events from the main index; it does not identify servers missing from the lookup—instead, it marks servers missing from the main index with 'missing' status.

52
MCQmedium

A search uses eventstats to add the average response time per server to each event. Which of the following correctly describes the output?

A.Each event retains its original fields and gains the average response time for its server.
B.Only one event per server is returned, showing the average response time.
C.A running average is calculated across all events.
D.Events are grouped by server and the top values are listed.
AnswerA

eventstats adds aggregate statistics without reducing the number of events.

Why this answer

The `eventstats` command in Splunk computes aggregate statistics (like average) over a specified field grouping and then appends the result to every original event, not just one per group. In this case, it calculates the average response time per server and adds that value as a new field to each event that belongs to that server, preserving all original fields and events.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats` — the trap here is that candidates confuse `eventstats` with `stats` and assume it collapses events, or they confuse it with `streamstats` and think it calculates a running average.

How to eliminate wrong answers

Option B is wrong because `eventstats` does not reduce the number of events; it returns all original events, each enriched with the aggregate value, unlike `stats` which collapses events into one per group. Option C is wrong because `eventstats` with a `BY server` clause calculates the average per server group, not a running average across all events (which would require `streamstats` or no `BY` clause). Option D is wrong because `eventstats` does not sort or list top values; it simply adds the computed aggregate to each event without reordering or filtering.

53
MCQmedium

A security analyst wants to find all events where the field 'src_ip' matches any IP address in a lookup table named 'malicious_ips.csv'. The lookup has fields 'ip' and 'threat'. Which search correctly enriches events with the threat info and filters to only malicious IPs?

A.`index=security | lookup malicious_ips.csv src_ip | search threat=*`
B.`index=security | lookup malicious_ips.csv src_ip OUTPUT threat | where threat!=""`
C.`index=security [| inputlookup malicious_ips.csv | fields ip | rename ip as src_ip]`
D.`index=security | lookup malicious_ips.csv src_ip AS ip | where isnotnull(threat)`
E.`index=security | lookup malicious_ips.csv src_ip AS ip OUTPUTNEW threat | where isnotnull(threat)`
AnswerE

Correct: uses lookup with outputnew to add threat field, then filters where threat is not null.

Why this answer

Option E is correct because it uses the `lookup` command with `OUTPUTNEW threat` to add the threat field only for matching src_ip values, and then `where isnotnull(threat)` filters to events that actually matched, ensuring only events with a known malicious IP are retained. The `OUTPUTNEW` clause is critical here as it only populates the threat field when a match occurs, unlike `OUTPUT` which would overwrite existing values.

Exam trap

The trap here is that candidates often confuse `OUTPUT` with `OUTPUTNEW` or forget that `where threat!=""` does not catch null values, leading them to pick options that either fail to enrich or fail to filter correctly.

How to eliminate wrong answers

Option A is wrong because `search threat=*` after a lookup without `OUTPUT` will not filter correctly—it would include events where threat is literally an asterisk or fail to filter nulls properly, and the lookup syntax is missing the `OUTPUT` clause to bring in the threat field. Option B is wrong because `where threat!=""` uses an empty string check, but if the lookup fails to match, the threat field may not exist at all (null), not an empty string, so this condition would not reliably filter unmatched events. Option C is wrong because it uses a subsearch with `inputlookup` to generate a list of IPs, but this only filters events where src_ip is in the list—it does not enrich events with the threat field, which the question requires.

Option D is wrong because `lookup malicious_ips.csv src_ip AS ip` incorrectly renames src_ip to ip before matching, which would look for a field named 'ip' in the events (which doesn't exist), causing the lookup to fail; additionally, `where isnotnull(threat)` would never be true because the threat field was never output.

54
MCQeasy

A user wants to create a chart showing the count of errors per hour for the last 24 hours, with time bucketed hourly. Which search is correct?

A.index=main error | timechart count span=1h
B.index=main error | bucket _time span=1h | stats count by _time
C.index=main error | chart count by _time span=1h
D.index=main error | timechart count by _time span=1h
AnswerA

Correctly uses timechart with span.

Why this answer

Option A is correct because the `timechart` command automatically creates a time-based chart with a default count aggregation. The `span=1h` argument explicitly sets the bucket size to one hour, which groups events into hourly intervals over the last 24 hours. This produces the exact output the user needs: a count of errors per hour.

Exam trap

Splunk often tests the distinction between `timechart` and `chart` with `_time`, where candidates mistakenly think `chart count by _time span=1h` works like `timechart`, but `chart` does not support the `span` argument and treats `_time` as a categorical field.

How to eliminate wrong answers

Option B is wrong because `bucket _time span=1h` creates a new field `_time` with rounded timestamps, but the subsequent `stats count by _time` produces a table, not a chart, and does not automatically fill in empty time buckets. Option C is wrong because `chart` does not inherently treat `_time` as a time-based axis; it would treat `_time` as a categorical field, potentially creating a column for each unique timestamp rather than hourly buckets. Option D is wrong because `timechart count by _time` is redundant — `timechart` already uses `_time` as its implicit x-axis, and specifying `by _time` can cause unexpected behavior or errors, as `timechart` expects a field to split by, not the time field itself.

55
MCQhard

A team uses a large index with many sourcetypes. They want to find events where the field "status" contains either "error" or "failure" (case-insensitive), and also ensure that "response_time" > 1000. Which search best optimizes performance?

A.index=main | eventstats avg(response_time) as avg by category | stats count as cnt by category | where cnt>=100 | sort -avg | head 5
B.index=main | top category | eval avg=avg(response_time) | where count>=100
C.index=main | stats avg(response_time) as avg by category | where cnt>=100 | sort -avg | head 5
D.index=main | stats avg(response_time) as avg, count as cnt by category | where cnt>=100 | sort -avg | head 5
AnswerD

Correctly computes both statistics and filters.

Why this answer

Option D is correct because it efficiently computes both the average response_time and the count of events per category in a single stats command, then filters by count >=100, sorts by average descending, and returns the top 5 categories. This minimizes data movement and processing by performing all aggregations in one pass, which is optimal for large indexes with many sourcetypes.

Exam trap

Splunk often tests the misconception that you need separate commands for each aggregation (like eventstats then stats) or that you can reference a field in a where clause before it is defined, leading candidates to choose options that either fail syntactically or perform unnecessary intermediate operations.

How to eliminate wrong answers

Option A is wrong because it uses eventstats to compute an average per category but then does not filter on status or response_time, and the where clause references cnt>=100 without defining cnt in the pipeline, leading to incorrect results and unnecessary computation. Option B is wrong because top category returns the most common categories without any filtering on status or response_time, and eval avg=avg(response_time) is invalid in a non-aggregating context, causing a syntax error. Option C is wrong because it computes avg(response_time) but omits the count field, so the where cnt>=100 clause fails due to cnt not being defined, and it does not filter on status or response_time.

56
MCQhard

A search uses the map command to run a search for each value of a field. The search is taking a very long time. Which alternative approach is recommended for better performance?

A.Use the sort command
B.Use a subsearch with the IN operator instead
C.Use the transaction command
D.Use the foreach command to loop over fields
AnswerB

Subsearch performs a single lookup instead of per-event search

Why this answer

Option B is correct because replacing a `map` command with a subsearch using the `IN` operator allows Splunk to retrieve all matching field values in a single search pass, rather than executing a separate search for each value. The `map` command runs one search per input row, which can cause significant overhead and slow performance, especially with large result sets. Using `IN` in a subsearch collects the values first and then applies them as a filter in the outer search, reducing the number of search operations to one.

Exam trap

Splunk often tests the misconception that `map` is the only way to run a search for each value of a field, when in fact a subsearch with `IN` achieves the same result more efficiently by avoiding iterative search execution.

How to eliminate wrong answers

Option A is wrong because the `sort` command only reorders results and does not reduce the number of searches or improve the performance of a `map`-based workflow. Option C is wrong because the `transaction` command groups events into transactions based on fields or time, but it does not replace the iterative search behavior of `map` and can itself be resource-intensive. Option D is wrong because the `foreach` command iterates over fields within a single result row, not over multiple search executions, so it cannot replace the per-value search logic of `map`.

57
MCQmedium

A company uses a large Splunk environment with many users creating dashboards. They notice that some searches are slow and consume excessive resources. What is the best practice to optimize search performance?

A.Use the tstats command with summariesonly=t
B.Use the search command with a large time range
C.Use the eval command to create new fields
D.Use the stats command with by clause on high cardinality fields
AnswerA

Uses pre-summarized accelerated data, significantly faster.

Why this answer

The `tstats` command with `summariesonly=t` is the best practice because it queries accelerated data models or summary indices rather than raw event data, drastically reducing the amount of data scanned. This command leverages pre-computed statistics, which is the most efficient way to perform searches over large datasets, especially when users are building dashboards that run repeatedly.

Exam trap

Splunk often tests the misconception that `tstats` is only for advanced users or that it requires a data model, but the trap here is that candidates confuse `tstats` with `stats` and think any aggregation command is equally efficient, ignoring the critical role of summary acceleration.

How to eliminate wrong answers

Option B is wrong because using the `search` command with a large time range forces Splunk to scan all raw events across that entire period, which is resource-intensive and slow, the opposite of optimization. Option C is wrong because the `eval` command creates new fields at search time, adding computational overhead and not reducing the data volume; it does not leverage any pre-computed summaries. Option D is wrong because using the `stats` command with a `by` clause on high cardinality fields (e.g., user IDs or IP addresses) creates many distinct groups, consuming significant memory and CPU, and can even cause search failures due to memory limits.

58
MCQmedium

An analyst wants to create a running total of sales per day over a week. The data has fields: date, sales. Which search would produce a cumulative sum for each day?

A.... | eval running_total = running_sum(sales)
B.... | sort date | streamstats sum(sales) as running_total
C.... | eventstats sum(sales) as running_total
D.... | stats sum(sales) by date
AnswerB

streamstats with sum calculates cumulative sum over sorted events.

Why this answer

Option B is correct because it first sorts the events by date to ensure chronological order, then uses `streamstats` to compute a running (cumulative) sum of sales across each event in that order. `streamstats` processes events sequentially and adds the current value to the accumulated total, producing a cumulative sum per day.

Exam trap

Splunk often tests the distinction between `streamstats` (sequential, cumulative) and `eventstats` (non-sequential, global aggregate), and candidates mistakenly choose `eventstats` thinking it computes a running total because it adds a field to each event.

How to eliminate wrong answers

Option A is wrong because `running_sum()` is not a valid Splunk function; the correct function for cumulative sums is `streamstats sum()`. Option C is wrong because `eventstats` computes an aggregate statistic (e.g., total sum) over the entire result set and adds it to each event, not a running total per day. Option D is wrong because `stats sum(sales) by date` returns a single total per day, not a cumulative sum that grows across days.

59
MCQeasy

Which of the following is true about the sort command?

A.All of the above
B.It only sorts in ascending order by default
C.It can sort by multiple fields
D.It can use the limit parameter to limit results
AnswerA

All statements are correct.

Why this answer

Option A is correct because the sort command in Splunk can sort in ascending order by default, can sort by multiple fields, and can use the limit parameter to restrict the number of results. All three statements (B, C, and D) are true, making 'All of the above' the correct choice.

Exam trap

The trap here is that candidates may assume only one of B, C, or D is true, but the question is designed to test whether you recognize that all three statements are accurate, leading to 'All of the above' as the correct answer.

How to eliminate wrong answers

Option B is not wrong because it is true: the sort command sorts in ascending order by default unless the '-' prefix is used to specify descending order. Option C is not wrong because it is true: you can sort by multiple fields by listing them separated by commas, e.g., `sort field1, field2`. Option D is not wrong because it is true: the limit parameter (e.g., `sort limit=10 field`) restricts the output to the top N results based on the sort order.

60
MCQmedium

A search returns events with a field 'response_time' in milliseconds. The analyst wants to categorize response times into three buckets: 'fast' (< 100), 'medium' (100-500), 'slow' (> 500). Which search correctly creates this categorization?

A.| eval bucket=case(response_time<100,"fast", response_time>=100 AND response_time<=500,"medium", response_time>500,"slow")
B.| eval bucket=if(response_time<100,"fast",response_time<500,"medium","slow")
C.| eval bucket=if(response_time<100,"fast",if(response_time<=500,"medium","slow"))
D.| where response_time<100 | eval bucket="fast" | append [search where response_time>=100 AND response_time<=500 | eval bucket="medium"]
AnswerA

case evaluates conditions in order and returns the first true match.

Why this answer

Option A is correct because it uses the `case` function to evaluate multiple conditions in order, assigning 'fast' for response_time < 100, 'medium' for values between 100 and 500 inclusive, and 'slow' for values > 500. The `case` function returns the result of the first true condition, making it ideal for mutually exclusive buckets without overlapping logic.

Exam trap

Splunk often tests the difference between `if` and `case` functions, where candidates mistakenly think nested `if` is the only way to handle multiple conditions, overlooking that `case` is the idiomatic Splunk command for multi-bucket categorization and that `if` with a single condition cannot handle more than two outcomes without nesting.

How to eliminate wrong answers

Option B is wrong because the `if` function only supports a single condition; the second argument (response_time<500) is treated as the 'true' value for the first condition, and 'slow' is the 'false' value, so response times between 100 and 500 are incorrectly labeled 'slow' (since they are not <100, the else branch runs, but the else branch is a single value, not a nested condition). Option C is wrong because it uses nested `if` functions, which is syntactically valid but less efficient and error-prone; however, the logic is actually correct for this specific case, but the question asks for the 'correct' search, and Option A is the standard Splunk approach using `case` for clarity and maintainability. Option D is wrong because it uses `where` and `append` to create separate result sets, which is overly complex, inefficient, and does not produce a single bucket field for all events in one pass; it also fails to handle events that don't match any condition (e.g., response_time exactly 100 or 500 are not covered by the first `where`).

61
MCQeasy

Which command creates a new field that contains the string 'high' if a numeric field exceeds 100, otherwise 'low'?

A.eval status=if(value>100,"high","low")
B.eval status=case(value>100,"high",true(),"low")
C.eval status=if(value>100,high,low)
D.None of the above
AnswerA

Correct syntax with quoted strings.

Why this answer

Option A is correct because the `eval` command with the `if` function correctly checks if the numeric field `value` exceeds 100 and returns the string 'high' or 'low'. In Splunk's `eval`, the `if` function requires the true and false results to be quoted strings when they are literal text, as shown in option A.

Exam trap

Splunk often tests the requirement to quote string literals in `eval` expressions, and the trap here is that candidates may forget to quote the string values 'high' and 'low', treating them as field names instead of literal strings.

How to eliminate wrong answers

Option B is wrong because the `case` function syntax is incorrect: the condition `value>100` is followed by `"high"`, but the default case uses `true()` without a corresponding result string; the correct syntax would be `case(value>100,"high",1=1,"low")` or similar. Option C is wrong because the `if` function's true and false results are unquoted (`high` and `low`), which Splunk interprets as field names or variable references, not literal strings, leading to errors or unexpected behavior. Option D is wrong because option A is correct.

62
MCQhard

A web application log contains fields: user, timestamp, response_time. You need to compute the average response time per user, excluding outliers where response_time > 10000ms. Which search produces the correct result?

A.index=web | stats avg(response_time) as avg by user | eval avg = if(avg > 10000, null, avg)
B.index=web | stats avg(response_time) by user | where response_time < 10000
C.index=web | eventstats avg(response_time) as overall_avg | where response_time < 10000 | stats avg(response_time) by user
D.index=web | where response_time < 10000 | stats avg(response_time) by user
AnswerD

Filters outliers first, then computes average per user.

Why this answer

Option B is correct because it filters out outliers before computing the average per user. Option A filters after stats, so the average still includes outliers. Option C filters after stats as well, but tries to nullify the average, which is incorrect.

Option D uses eventstats to compute overall average, then filters, then computes per-user average; this still includes outliers in the per-user average because the filter does not retroactively change the eventstats calculation.

63
MCQhard

A developer needs to calculate the 95th percentile of response times for each service over the past hour. The data has fields: service, response_time. Which search achieves this correctly and efficiently?

A.`index=main | stats perc95(response_time) by service`
B.`index=main | timechart perc95(response_time) by service`
C.`index=main | eventstats perc95(response_time) as p95 by service | stats values(p95) as p95 by service`
D.`index=main | streamstats perc95(response_time) as p95 by service | stats latest(p95) as p95 by service`
AnswerC

Correctly calculates the 95th percentile per service using eventstats and then collapses to one value per service.

Why this answer

Option C is correct because `eventstats` computes the 95th percentile per service across all events in the result set, adding the value as a new field to each event, and then `stats values(p95) by service` collapses the identical values into a single row per service. This avoids the overhead of time-based bucketing and ensures the percentile is calculated over the entire hour's data in one pass, making it both accurate and efficient.

Exam trap

Splunk often tests the distinction between `eventstats` (global aggregation appended to events) and `streamstats` (running aggregation per event), and candidates mistakenly choose `streamstats` thinking it computes a final percentile, when it actually produces a cumulative value that changes with each event.

How to eliminate wrong answers

Option A is wrong because `stats perc95(response_time) by service` is not valid syntax; the correct function is `perc95(response_time)` or `exactperc95(response_time)`, and `perc95` is not a recognized stats function in Splunk. Option B is wrong because `timechart` automatically splits the data into time buckets (e.g., 1-minute spans), which would calculate the 95th percentile per time slice rather than over the entire past hour, producing incorrect results for the requirement. Option D is wrong because `streamstats` computes a running (cumulative) percentile as each event is processed, not the overall percentile for the entire hour, and `latest(p95)` would only capture the final running value, which is not the same as the global 95th percentile.

64
MCQmedium

A dashboard is slow to load because it runs a search that uses `transaction` to group events into sessions. The search is `index=main source=web | transaction clientip maxspan=30m maxpause=5m`. What is the most effective way to improve performance?

A.Add `| head 1000` before the `transaction` command
B.Replace `transaction` with `stats dc(_time) as session_duration by clientip` and use `bin`
C.Set `maxspan=1h` and `maxpause=1m`
D.Add `| eval session_id=random()` before transaction
AnswerB

`stats` is more efficient and can approximate sessions.

Why this answer

Option B is correct because replacing `transaction` with `stats` and `bin` avoids the expensive event grouping and stateful processing that `transaction` requires. The `transaction` command must hold events in memory to correlate them by `clientip` within time windows, which is slow on large datasets. Using `stats dc(_time)` with `bin` computes session metrics more efficiently by aggregating over time buckets without tracking individual event sequences.

Exam trap

Splunk often tests the misconception that `transaction` is the only way to group events into sessions, when in fact `stats` with `bin` or `eventstats` can achieve similar results with far better performance.

How to eliminate wrong answers

Option A is wrong because adding `| head 1000` before `transaction` would discard most events, producing incomplete and misleading session data, and does not address the root cause of slow performance. Option C is wrong because tightening `maxspan` and `maxpause` may reduce the number of events grouped per session but does not eliminate the fundamental overhead of the `transaction` command's stateful processing. Option D is wrong because `| eval session_id=random()` before `transaction` adds a random field that has no correlation with actual sessions, and `transaction` would still need to process all events with the same overhead.

65
Multi-Selecteasy

A search is running slowly due to a large data volume. Which TWO modifications are likely to improve search performance? (Select two.)

Select 2 answers
A.Use wildcard characters at the beginning of search terms.
B.Use the transaction command to group events.
C.Reduce the time range of the search.
D.Use the dedup command as early as possible.
E.Use indexed fields instead of search-time extracted fields.
AnswersC, E

Limits data volume scanned

Why this answer

Reducing the time range limits the volume of data scanned by the search head, directly reducing I/O and processing overhead. This is one of the most effective ways to improve search performance because Splunk must read and filter every event in the specified time window from the index.

Exam trap

Splunk often tests the misconception that using the transaction or dedup command early in a search improves performance, when in fact these commands are memory-intensive and should be deferred until after data volume is reduced.

66
MCQmedium

Refer to the exhibit. The search is intended to display users who logged in from IP addresses starting with 10.0, but returns no results. What is the most likely cause?

A.The regex pattern is incorrect.
B.The field 'ip' is not extracted properly.
C.The `search` command should be `where` to use wildcard on extracted fields.
D.The index should be specified at the beginning of the search.
AnswerC

For extracted (non-indexed) fields, `search` may not support wildcards efficiently; `where` with `like` is appropriate.

Why this answer

The search uses `search ip=10.0*` which attempts to apply a wildcard pattern to an extracted field. However, the `search` command does not support wildcards for field-value comparisons; it treats `10.0*` as a literal string. To use wildcards on extracted fields, the `where` command with a `like` operator (e.g., `where ip like "10.0%"`) or a regex match is required.

This is why option C is correct.

Exam trap

Splunk often tests the misconception that the `search` command supports wildcards for extracted fields, leading candidates to overlook the need for `where` or `regex` commands for pattern matching on field values.

How to eliminate wrong answers

Option A is wrong because the regex pattern is not the issue; the search does not use a regex command at all, and the problem lies in the `search` command's inability to interpret wildcards on field values. Option B is wrong because the field 'ip' is likely extracted properly (otherwise the search would not run without errors), but the wildcard matching fails due to command semantics. Option D is wrong because specifying the index at the beginning is a best practice for performance but is not required for the search to return results; the absence of an index does not cause zero results when the data is already in the default index.

67
MCQhard

Refer to the exhibit. The search is taking very long and returning few results. Which change would most improve performance?

A.Change maxpause to 30s.
B.Remove the eval command.
C.Replace transaction with stats and use values() for fields.
D.Add a time range to the main search.
AnswerD

Limiting the time range reduces the amount of data processed, improving performance.

Why this answer

The exhibit shows a transaction command that groups events by a session field, but without a time range, the search must scan all indexed data, which is extremely slow. Adding a time range (e.g., earliest=-1h) limits the data scanned, drastically improving performance while still allowing the transaction to complete within the default maxpause of 5s.

Exam trap

The trap here is that candidates focus on tuning the transaction parameters (maxpause) or replacing the command, rather than recognizing that the fundamental performance bottleneck is the absence of a time range filter in the base search.

How to eliminate wrong answers

Option A is wrong because increasing maxpause to 30s would make the transaction wait longer for late events, potentially increasing search time and resource usage, not improving performance. Option B is wrong because removing the eval command (which likely creates the session field used by transaction) would break the grouping logic, making the search return incorrect or no results. Option C is wrong because replacing transaction with stats and values() might reduce memory overhead but would not address the root cause of scanning all time; without a time range, stats would still scan the entire index, and the search would remain slow.

68
MCQhard

A search returns 50,000 events. The analyst wants to sample 1% evenly across time. Which sampling command should be used?

A.sample 0.01
B.sample method=random ratio=0.01
C.sample method=block ratio=0.01
D.sample ratio=0.01
AnswerB

This performs random sampling with a 1% ratio, distributing events evenly across time.

Why this answer

Option B is correct because the `sample` command with `method=random` and `ratio=0.01` performs a random sampling of exactly 1% of events, and when used without a `by` clause, it distributes the sampling evenly across time by default. This ensures a statistically representative subset of the 50,000 events, preserving temporal distribution.

Exam trap

The trap here is that candidates often assume `sample` defaults to random sampling, but it actually defaults to `method=block`, so omitting the `method=random` parameter (as in option D) would not achieve the required even distribution across time.

How to eliminate wrong answers

Option A is wrong because `sample 0.01` is invalid syntax; the `sample` command requires the `ratio` argument to be explicitly named (e.g., `ratio=0.01`) and does not accept a bare number. Option C is wrong because `method=block` samples contiguous blocks of events, which would not distribute evenly across time and could cluster events from a specific time period, violating the requirement for even temporal distribution. Option D is wrong because `sample ratio=0.01` defaults to `method=block`, not `method=random`, so it would produce block sampling rather than the random sampling needed for even distribution across time.

69
MCQhard

A large e-commerce platform uses Splunk to monitor user sessions. Each session is composed of multiple events with a common 'session_id' field. The current search to compute average session duration is: 'index=web | transaction session_id maxspan=30m | eval duration=_time_last - _time | stats avg(duration)'. This search runs for over an hour on a 6-hour time window. The environment has 20 indexers and data volume is 2 TB/day. The admin suspects that the transaction command is the bottleneck. Which optimization should be applied?

A.Reduce the time range to 1 hour.
B.Add 'eventstats earliest(_time) as start latest(_time) as end by session_id' before transaction.
C.Replace transaction with 'stats earliest(_time) as start latest(_time) as end by session_id | eval duration=end-start | stats avg(duration)'.
D.Remove the maxspan parameter from the transaction command to allow longer sessions.
AnswerC

Much more efficient because stats uses less memory than transaction.

Why this answer

Option D is correct because using 'stats' with 'range(_time)' by session_id to compute duration is much more efficient than transaction. Option A removes the maxspan, which may cause sessions to be open-ended and consume more memory. Option B reduces the time range but does not address the inefficiency of transaction.

Option C adds an eventstats that does not replace the transaction.

70
MCQhard

A Splunk administrator is troubleshooting a search that uses the transaction command to group login and logout events. The search runs but returns no results even though both types of events exist. The events are separated by at most 5 minutes. The current transaction command is: `index=auth (action=login OR action=logout) | transaction action maxspan=10m maxpause=2s` What is the most likely cause?

A.The maxspan value is too large, causing events to be grouped incorrectly.
B.The transaction command requires the connected=true argument to group events.
C.The transaction command requires keepevents=true to retain all events.
D.The maxpause value is too small; events may be more than 2 seconds apart.
AnswerD

maxpause sets the maximum time between events in a transaction; 2 seconds may be too restrictive.

Why this answer

The maxpause=2s parameter defines the maximum allowed gap between consecutive events in a transaction. If the actual time between a login and its corresponding logout event exceeds 2 seconds, the transaction command will close the transaction prematurely, treating the logout as the start of a new transaction. Since the events are separated by at most 5 minutes but could be more than 2 seconds apart, the maxpause value is too restrictive, causing the transaction to never complete with both events.

Exam trap

Splunk often tests the distinction between maxspan (total transaction duration) and maxpause (gap between events), leading candidates to incorrectly assume that a large maxspan is the problem when the real issue is an overly restrictive maxpause.

How to eliminate wrong answers

Option A is wrong because a maxspan of 10 minutes is appropriate for events separated by at most 5 minutes; a larger maxspan does not cause grouping errors—it simply allows a wider window. Option B is wrong because the connected=true argument is used for subsearches or to enforce field-based connections, not for the basic transaction command which groups by the specified field (action) by default. Option C is wrong because keepevents=true is used to retain all raw events in the transaction output for inspection, but its absence does not prevent the transaction from forming; it only affects whether individual events are preserved in the results.

71
MCQmedium

The search above is executed but returns unexpected results: the count for 'API' is much lower than expected. What is the most likely cause?

A.The stats command should use 'count by category' but category is not a field until after eval.
B.The regex does not account for the HTTP version string after the URL, causing the URL field to include extra characters like 'HTTP/1.1'.
C.The case function has a default condition '1==1' that overrides all other conditions.

Why this answer

The regex uses (?<method>...) but the group names are case-sensitive; 'method' and 'url' are extracted correctly. However, the issue is that the regex expects exactly one space between method and URL, but some HTTP requests may have additional spaces or different formatting. More importantly, the 'category' eval uses match functions; if the URL field is not extracted for some events (e.g., due to regex failure), category becomes null.

But the most likely cause is that the regex does not account for query strings or fragments in the URL, causing the match to fail when URL contains '?' or '#'.

72
Multi-Selecthard

A search administrator wants to ensure that a scheduled search runs efficiently and does not impact other users. Which TWO practices should be implemented? (Select two.)

Select 2 answers
A.Set the 'dispatch.earliest_time' and 'dispatch.latest_time' to a specific time range
B.Use the 'max_time' setting in the search command
C.Use 'collect' to index summary results
D.Enable 'auto_summarize' on the search
E.Use the 'priority' setting in savedsearches.conf
AnswersA, D

Reduces data scanned, making search faster.

Why this answer

Option A is correct because setting 'dispatch.earliest_time' and 'dispatch.latest_time' to a specific time range limits the data scanned by the scheduled search, reducing resource consumption and preventing it from impacting other users. Option D is correct because enabling 'auto_summarize' on the search creates pre-computed summary tables that allow the scheduled search to run against summarized data rather than raw events, drastically improving efficiency and reducing system load.

Exam trap

The trap here is that candidates often confuse 'max_time' (a command-level timeout) with controlling the search time window, or they think 'collect' improves search efficiency when it actually adds indexing overhead after the search completes.

73
MCQmedium

Refer to the exhibit. The search results show a large number of hosts, but the `limit=5` only shows the top 5. The eval statement fails with an error. Why?

A.The timechart span should be smaller to avoid too many fields.
B.Eval cannot be used after timechart.
C.The eval statement must use aggregation functions.
D.The field names created by timechart are based on the host names, not `count_1`, etc.
AnswerD

timechart with limit=5 creates fields like `hostname: count`, not generic count_1.

Why this answer

Option D is correct because the `timechart` command in Splunk dynamically creates field names based on the values of the split-by field (in this case, `host`). When you use `timechart count by host limit=5`, the resulting fields are named after the actual host names (e.g., `host1`, `host2`), not generic names like `count_1`. The subsequent `eval` statement fails because it references `count_1`, which does not exist as a field in the results.

Exam trap

Splunk often tests the misconception that `timechart` with a `limit` option creates generic field names like `count_1`, `count_2`, etc., when in reality it uses the actual values from the split-by field as field names.

How to eliminate wrong answers

Option A is wrong because the `span` of the timechart does not affect the number of fields created; it only controls the time bucket size. Option B is wrong because `eval` can be used after `timechart`; the error is not due to a restriction on command order but because the field name referenced in `eval` does not exist. Option C is wrong because `eval` does not require aggregation functions after `timechart`; it can perform row-by-row calculations on existing fields, but the field must exist.

74
MCQhard

A security analyst needs to find all login events where the user 'jsmith' attempted to authenticate from an IP address outside the corporate subnet (10.0.0.0/8) after business hours (after 18:00). Which search correctly filters for these events?

A.index=main sourcetype=login user=jsmith | where 'date_hour' > 18 | where NOT cidrmatch("10.0.0.0/8", src_ip)
B.index=main sourcetype=login user=jsmith date_hour>18 | search NOT src_ip=10.0.0.0/8
C.index=main sourcetype=login user=jsmith date_hour>18 | where not src_ip like "10.%"
D.index=main sourcetype=login user=jsmith date_hour>18 | where src_ip!=10.0.0.0/8
AnswerA

Correctly uses `where` with `cidrmatch` and filters by hour.

Why this answer

Option A is correct because it uses the `cidrmatch` function to properly evaluate whether the source IP falls within the 10.0.0.0/8 subnet. The `where` clause with `date_hour > 18` correctly filters for events after business hours, and the `NOT cidrmatch` ensures only IPs outside the corporate subnet are included. This approach handles CIDR notation accurately, unlike simple string or inequality comparisons.

Exam trap

The trap here is that candidates often assume simple string or inequality operators (like `!=` or `like`) can handle CIDR subnet matching, but Splunk requires the `cidrmatch` function for accurate network range evaluation.

How to eliminate wrong answers

Option B is wrong because `search NOT src_ip=10.0.0.0/8` treats the CIDR notation as a literal string, not a subnet match, so it will not correctly exclude all IPs in the 10.0.0.0/8 range. Option C is wrong because `like "10.%"` is a wildcard pattern match that only catches IPs starting with '10.' but fails to account for the full 10.0.0.0/8 subnet (e.g., 10.0.0.0/8 includes 10.0.0.0 through 10.255.255.255, but '10.%' may miss IPs with different octet patterns or include unintended matches). Option D is wrong because `src_ip!=10.0.0.0/8` uses an inequality operator that compares the IP as a string, not a subnet, so it will not perform CIDR matching and will likely exclude no IPs or produce incorrect results.

75
MCQmedium

A network operations team uses Splunk to analyze firewall logs. They need to identify top talkers (source IPs with highest total bytes) over the last hour. The current search: 'index=firewall | stats sum(bytes) as totalBytes by src_ip | sort -totalBytes | head 10' takes 5 minutes to complete. They want to make it faster. The environment has 5 indexers with default configurations. The data volume is 100 GB/day. Which action will most improve search performance?

A.Add 'earliest=-1h' to the search to restrict the time range explicitly.
B.Replace head 10 with limit 10 at the end of the pipeline.
C.Use map to run the search per indexer.
D.Set the search's parallelism to 'auto' in the commands.
AnswerA

Limits the data scanned by the indexers from the start.

Why this answer

Option A is correct because explicitly adding 'earliest=-1h' restricts the search to the last hour at the search head level, allowing Splunk to use time-based index metadata to skip irrelevant buckets entirely. Without an explicit time range, Splunk may scan all available data, dramatically increasing I/O and search time. This is the most impactful optimization for time-bound searches over large datasets.

Exam trap

The trap here is that candidates may overlook the most fundamental Splunk optimization—explicit time range—and instead focus on command-level tweaks like 'limit' or parallelism, which have negligible or negative impact on performance.

How to eliminate wrong answers

Option B is wrong because 'head 10' and 'limit 10' are functionally identical in Splunk; 'limit' is simply an alias for 'head' and does not change performance. Option C is wrong because the 'map' command runs a subsearch for each result, which would multiply the workload and degrade performance, not improve it. Option D is wrong because parallelism in Splunk is controlled by the search head and indexers automatically; setting it to 'auto' is the default and does not override the need for a time range restriction.

Page 1 of 2 · 150 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Advanced Searching and Statistics questions.