CCNA Advanced Searching and Statistics Questions — Page 2 of 2

Drag & Dropmedium

Arrange the steps to configure role-based access control in Splunk.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Roles are configured by setting capabilities and resource access restrictions.

Practice this question →

MCQhard

A security analyst wants to find IP addresses that have been involved in both login failures and successful logins within a 5-minute window. Which approach is most efficient?

A.Using the transaction command

B.Using the appendcols command

C.Using a subsearch

D.Using the stats command with values

AnswerA

Groups events by IP within a time span, ideal for this scenario.

Why this answer

Option B is correct because the transaction command groups events from the same IP within a time window, ideal for correlating failure and success. Option A is wrong because subsearches are resource-intensive and not ideal for this correlation. Option C is wrong because stats with values does not guarantee temporal proximity.

Option D is wrong because appendcols requires exact field matching and does not handle time windows.

Practice this question →

MCQmedium

Refer to the exhibit. This search is intended to find users with average duration above overall average. However, it returns no results. Why?

A.eventstats should be after stats

B.The where clause should use the 'search' command

C.overall_avg is not available in the where clause because it is created in eventstats

D.The search requires a subquery to compute overall_avg

AnswerC

Stats output does not include fields from prior commands unless preserved.

Why this answer

Option B is correct: eventstats adds overall_avg to each event, but stats by user only outputs user and user_avg, dropping overall_avg, so where compares a non-existent field to user_avg. Option A is wrong because eventstats before stats is correct conceptually. Option C is wrong because where works fine with fields present.

Option D is wrong as subquery is not needed.

Practice this question →

MCQhard

Refer to the exhibit. This search returns an error. What is the most likely cause?

A.The timechart command requires a _time field which is not present after stats

B.The eval command cannot be used before timechart

C.The status_group field is not available after timechart because it was created in eval

D.The stats command aggregates data, so timechart cannot use the aggregated count field

AnswerA

Stats does not preserve _time unless explicitly used in a by clause.

Why this answer

The error occurs because the `stats` command removes the `_time` field from the events, and `timechart` requires a valid `_time` field to create time-based buckets. Without `_time`, `timechart` cannot generate the time axis, resulting in a search error. This is a common pitfall when chaining `stats` before `timechart` without preserving the time field.

Exam trap

Splunk often tests the misconception that `stats` preserves `_time` by default, leading candidates to overlook that `timechart` requires an explicit time field in the result set.

How to eliminate wrong answers

Option B is wrong because `eval` can be used before `timechart` without issue, as long as the required `_time` field is present. Option C is wrong because `status_group` is created by `eval` and is available to `timechart`; the error is not about field availability but the missing `_time` field. Option D is wrong because `timechart` can use aggregated count fields from `stats`; the real problem is that `stats` removes `_time`, not that the count field is incompatible.

Practice this question →

MCQhard

An administrator wants to correlate events from the same session but the events span up to 30 minutes apart. The transaction command is being considered. Which transaction option is most appropriate to ensure sessions are correctly grouped without artificially high memory usage?

A.| transaction sessionid maxspan=30m

B.| transaction sessionid maxspan=30m maxpause=5m

C.| transaction sessionid maxevents=100

D.| transaction sessionid maxspan=30m keepevicted=true

AnswerB

Correctly defines time window and pause to group sessions

Why this answer

Option B is correct because the `maxspan=30m` ensures events spanning up to 30 minutes are grouped into the same transaction, while `maxpause=5m` prevents the transaction from remaining open indefinitely by closing it after 5 minutes of inactivity. This combination correctly groups sessions without keeping the transaction open for the full 30 minutes, which would artificially increase memory usage by holding events in the buffer.

Exam trap

Splunk often tests the misconception that `maxspan` alone is sufficient to control memory usage, when in fact `maxpause` is critical to close transactions during idle periods and prevent excessive memory consumption.

How to eliminate wrong answers

Option A is wrong because using only `maxspan=30m` without `maxpause` means the transaction will remain open for the entire 30-minute span even if there are long gaps between events, causing high memory usage as events are held in the buffer. Option C is wrong because `maxevents=100` limits the number of events per transaction but does not address the time span or pause requirements, so sessions spanning 30 minutes may be split or incomplete. Option D is wrong because `keepevicted=true` retains evicted (incomplete) transactions in the output, which does not help control memory usage and may actually increase it by including partial groups.

Practice this question →

MCQeasy

A security analyst is investigating a potential breach. They have a search that uses the transaction command to group events by session_id and calculates the total bytes transferred per session. However, the search takes over 30 minutes to complete on a 24-hour time range. The environment has 10 indexers with default settings. The analyst needs to reduce search time while preserving the ability to group by session_id. Which course of action should they take?

A.Pre-aggregate events by session_id using 'stats values(*) as * sum(bytes) as total_bytes by session_id' before the transaction command.

B.Use an append command to add a subsearch that pre-filters events.

C.Replace transaction with the 'streamstats' command to compute running totals.

D.Add the 'local' keyword to the transaction command to force it to run on a single indexer.

AnswerA

Reduces the number of events per session, making transaction faster.

Why this answer

Option C is correct because summarizing events by session_id using stats with values and sum before the transaction command reduces the number of events that transaction needs to process. Option A would disable parallel processing, making it slower. Option B adds subsearch overhead.

Option D changes the grouping logic and does not reduce the workload.

Practice this question →

MCQhard

A Splunk administrator runs the following search and notices that the results include events where the 'status' field is 200 or 404, but also includes events where the 'status' field is missing. What is the most efficient way to modify the search to exclude events where the 'status' field does not exist?

A.status=200 OR status=404 | search status!=null

B.NOT ISNULL(status) (status=200 OR status=404)

C.status=200 OR status=404 | where isnotnull(status)

D.status=200 OR status=404

AnswerB

ISNULL(status) returns true if field does not exist; NOT ISNULL ensures only events with a status field are considered.

Why this answer

Option B is correct because it uses the `NOT ISNULL(status)` filter before the OR conditions, which efficiently excludes events where the `status` field does not exist. In Splunk, `ISNULL()` returns true if a field is missing or null, so `NOT ISNULL(status)` ensures only events with a defined `status` field are considered, and then the parentheses group the OR conditions correctly. This approach is more efficient than post-filtering because it reduces the result set early in the search pipeline.

Exam trap

The trap here is that candidates often confuse `ISNULL()` with checking for empty strings or use `!=null` as if it were SQL, failing to recognize that Splunk requires explicit `ISNULL()` or `isnull()` functions for field existence checks.

How to eliminate wrong answers

Option A is wrong because `status!=null` is not a valid Splunk syntax for checking field existence; it compares the field value to the literal string 'null' rather than checking for absence. Option C is wrong because while `where isnotnull(status)` works, it is less efficient than using `NOT ISNULL(status)` in the base search, as `where` processes all results after the initial search, whereas the base search filter can leverage index-time optimizations. Option D is wrong because it simply searches for status=200 OR status=404 without any filter to exclude events where the status field is missing, so it will still include those events.

Practice this question →

Multi-Selectmedium

A user needs to identify the top 3 error types by count, but only for the current month, and exclude results with fewer than 100 occurrences. Which TWO steps are necessary? (Select two.)

Select 2 answers

A.Use the time range picker to set 'Current Month'

B.Use the where command to filter count>=100

C.Use the search command with earliest and latest

D.Use the top command with limit=3

E.Use the time command with relative time modifiers

AnswersB, D

Excludes error types with count less than 100.

Why this answer

Option B is correct because the `where` command in Splunk is used to filter results based on a condition, and here it is needed to exclude error types with fewer than 100 occurrences after counting. Option D is correct because the `top` command with `limit=3` returns the top 3 values of a field by count, which directly satisfies the requirement to identify the top 3 error types.

Exam trap

Splunk often tests the distinction between using the time range picker versus explicit time commands in the search, and candidates may incorrectly assume that the time range picker is a necessary step when the search itself can use relative time modifiers like `earliest=-30d@d`.

Practice this question →

MCQeasy

A user runs a search on web access logs: `index=web | eventstats sum(bytes) as total_bytes by host`. The search returns the correct total bytes per host, but now the user needs to calculate the average bytes per host for each event. Which command should be added to the base search to achieve this?

A.Add `| eventstats avg(bytes) as avg_bytes by host` after the first eventstats.

B.Replace eventstats with `| streamstats avg(bytes) as avg_bytes by host`.

C.Add `| eval avg_bytes = total_bytes / count` after the eventstats.

D.Use `| stats avg(bytes) by host` then `| join host [search index=web]`.

AnswerA

eventstats can compute average directly and add it to each event.

Why this answer

eventstats can compute the average directly with `avg(bytes)`. Option A requires manually calculating average with count, which is more complex. Option C uses streamstats, which computes a running average, not overall.

Option D uses stats and join, which is slower and may not work well.

Practice this question →

Multi-Selectmedium

Which TWO of the following are valid uses of the stats command in Splunk? (Choose two.)

Select 2 answers

A.stats mode(score) by group

B.stats values(ip) by user

C.stats count by host

D.stats median(response_time) by server

E.stats first(error_code) by session

AnswersB, C

Valid: returns list of distinct IPs per user.

Why this answer

The `stats` command in Splunk can compute aggregate statistics over fields. `values(ip) by user` is valid because `values()` returns a multivalue list of all distinct `ip` values for each `user`, which is a standard aggregation function. `count by host` is valid because `count` is a default aggregation that counts events per `host`.

Exam trap

The trap here is that candidates may confuse valid `stats` functions with functions from other contexts (like `mode()` from statistics or `first()` from programming languages) or assume that `median()` is supported when Splunk uses percentile functions instead.

Practice this question →

MCQeasy

Refer to the exhibit. What is the result of this search?

A.A list of all users sorted by count ascending.

B.The first 5 events with failed password.

C.A table of users and their total counts, sorted by count descending, limited to 5 rows.

D.The top 5 users by username alphabetically.

AnswerC

This accurately describes the output of the search.

Why this answer

The search uses the `top` command, which by default returns the most common values of a field sorted by count in descending order, limited to 10 results. The `limit=5` parameter overrides the default to return only the top 5 users. The `countfield` option renames the count column to 'total', and the `showcount=f` hides the percent column, producing a table of users and their total counts sorted by count descending, limited to 5 rows.

Exam trap

Splunk often tests the default behavior of the `top` command—specifically that it sorts by count descending and limits results to 10—and candidates mistakenly think it returns all values or sorts alphabetically, or they overlook the `limit=5` override.

How to eliminate wrong answers

Option A is wrong because the `top` command sorts by count descending, not ascending, and it does not return all users—it limits results to the top 5. Option B is wrong because the search does not filter for 'failed password' events; it operates on all events in the index and uses the `top` command to find the most common users, not the first 5 events. Option D is wrong because the `top` command sorts by count, not alphabetically by username, and it returns the most frequent users, not a simple alphabetical list.

Practice this question →

MCQhard

A search uses `transaction maxspan=30s maxpause=5s`. Events are sorted by _time. If there is a gap of 10 seconds between two events, what happens?

A.They are merged because maxpause is 5s but maxspan is 30s, so the 10s gap is within maxspan.

B.They are considered part of the same transaction as long as total span ≤ 30s.

C.They are split only if the total span exceeds maxspan.

D.They are split into separate transactions because the gap exceeds maxpause.

AnswerD

A gap of 10s exceeds the 5s maxpause, so a new transaction begins.

Why this answer

The `maxpause` parameter in the `transaction` command defines the maximum allowed gap between consecutive events within the same transaction. Since the gap of 10 seconds exceeds the `maxpause=5s`, the events are split into separate transactions, regardless of the `maxspan=30s` limit. The `maxspan` only sets an upper bound on the total duration of the transaction from the first to the last event, but it does not override the pause-based splitting logic.

Exam trap

The trap here is that candidates often confuse `maxpause` with `maxspan`, mistakenly thinking that as long as the total duration is under `maxspan`, any gap is acceptable, when in fact `maxpause` enforces a strict per-gap limit that can split transactions independently.

How to eliminate wrong answers

Option A is wrong because it incorrectly assumes that a gap within `maxspan` overrides `maxpause`; in reality, `maxpause` is evaluated first and any gap exceeding it forces a split. Option B is wrong because it ignores the `maxpause` constraint entirely, suggesting that only the total span matters, which is false. Option C is wrong because it claims splitting only occurs when total span exceeds `maxspan`, but the `maxpause` parameter independently triggers splits on inter-event gaps.

Practice this question →

MCQhard

Refer to the exhibit. The search above returns no results for api_version. What is the most likely cause?

A.The stats command cannot be used after rex.

B.The field `uri_path` does not exist or contains data that does not match the pattern.

C.The search time range is too short to include any events.

D.The regex pattern is incorrectly written.

AnswerB

If `uri_path` is not a field in the sourcetype, the rex will not extract anything.

Why this answer

The `rex` command extracts fields based on a regex pattern applied to a specific source field. If `uri_path` does not exist in the events or its values do not match the pattern `(?<api_version>/v[0-9]+)`, then no `api_version` field will be created. This is the most likely cause because the search returns no results for `api_version`, indicating the extraction failed at the source field level.

Exam trap

Splunk often tests the misconception that a regex pattern is incorrect when the real issue is that the source field is missing or contains non-matching data, leading candidates to focus on syntax rather than data validation.

How to eliminate wrong answers

Option A is wrong because `stats` can be used after `rex` without issue; `rex` extracts fields, and `stats` can then aggregate them. Option C is wrong because if the time range were too short, the search would return no events at all, not just no results for `api_version` while other fields might exist. Option D is wrong because the regex pattern `(?<api_version>/v[0-9]+)` is syntactically correct for capturing a version string like `/v1` or `/v2`; the issue is that it is applied to a field that may not contain matching data.

Practice this question →

MCQmedium

You need to find the percentage of total events contributed by each sourcetype. Which command should follow index=* | stats count by sourcetype?

A.addtotals

B.eventstats sum(count) as total | eval percent = count/total*100

C.eval percent = count / sum(count) * 100

D.appendpipe [stats sum(count) as total] | eval percent = count/total*100

AnswerB

eventstats adds total column, then eval computes percentage per row.

Why this answer

Option A is correct because eventstats adds a total count field across all events, then eval computes the percentage. Option B addtotals adds row totals, not a column total. Option C attempts to use sum in eval, which is invalid.

Option D appendpipe adds a row with total, not a column, making the eval compute incorrectly.

Practice this question →

Multi-Selecthard

Which TWO of the following eval functions can be used to convert a string to a numeric value?

Select 2 answers

A.tostring()

B.number()

C.int()

D.str()

E.tonumber()

AnswersC, E

`int()` converts a value to an integer, working on strings as well.

Why this answer

The `int()` function (option C) converts a string representation of an integer into a numeric integer value, and `tonumber()` (option E) converts a string to a floating-point or integer number, making both valid for converting strings to numeric values in Splunk's eval command.

Exam trap

Splunk often tests candidates' familiarity with Splunk's specific eval function names, and the trap here is that `number()` and `str()` sound plausible but are not valid Splunk functions, leading candidates to select them based on general programming knowledge rather than Splunk's actual syntax.

Practice this question →

MCQmedium

A security analyst wants to find IP addresses that have attempted to access a specific URL more than 5 times in the last hour and also have a user agent string containing "curl". They need to use a subsearch to pre-filter IPs. Which search is correct?

A.[search index=web sourcetype=access useragent=*curl* | stats count by src_ip | where count>5] | fields src_ip

C.index=web sourcetype=access | search useragent=*curl* | stats count by src_ip | where count>5

D.index=web sourcetype=access ( useragent=*curl* ) | stats count by src_ip | where count>5

AnswerB

Correctly uses subsearch to filter IPs, then counts and filters.

Why this answer

Option B is correct because it uses a subsearch to first find IPs that have accessed the URL more than 5 times with a user agent containing 'curl', then passes those IPs to the outer search to filter the original data. The subsearch returns a list of src_ip values, which the outer search uses as a filter, ensuring only IPs meeting both conditions are counted again. This matches the requirement to pre-filter IPs using a subsearch.

Exam trap

The trap here is that candidates often confuse a subsearch with a simple filter or stats command, leading them to choose options that either omit the subsearch syntax or place it incorrectly, such as at the start without proper piping.

How to eliminate wrong answers

Option A is wrong because the subsearch is placed at the beginning without a leading pipe, making it a standalone search that does not feed into the outer search; it also lacks the outer search's index and sourcetype, so it returns no results. Option C is wrong because it does not use a subsearch at all; it simply filters and counts in a single search, which does not pre-filter IPs as required. Option D is wrong because it uses parentheses incorrectly and does not include a subsearch; it performs a single-pass filter and count, failing to pre-filter IPs.

Practice this question →

MCQmedium

A security analyst wants to calculate the average latency for each web server over the past hour, but only for requests where the status code is 200. The search result includes fields: server, latency, status. Which search correctly accomplishes this?

A.index=web sourcetype=access | eval good_latency=if(status=200, latency, null) | stats avg(good_latency) by server

B.index=web sourcetype=access | eventstats avg(latency) by server | where status=200

C.index=web sourcetype=access | stats avg(latency) by server | where status=200

D.index=web sourcetype=access status=200 | stats avg(latency) by server

AnswerD

Correctly filters only status=200 events before statistical aggregation.

Why this answer

Option D is correct because it filters events to only those with status=200 before the stats command, ensuring the average latency is calculated exclusively over successful requests. The stats command then computes the average latency grouped by server, which directly answers the requirement without needing conditional logic or post-filtering.

Exam trap

The trap here is that candidates often think they can filter after stats using where, but stats collapses events into summary statistics, so a subsequent where cannot filter the original events used in the aggregation.

How to eliminate wrong answers

Option A is wrong because it uses eval to set good_latency to null for non-200 statuses, but stats avg() ignores null values, so it effectively averages only over status=200 events; however, this is less efficient and less idiomatic than filtering first, and the question asks for the 'correct' search, where D is the standard best practice. Option B is wrong because eventstats calculates the average latency across all events (including non-200) and adds it to each event, then filters to status=200; this gives the overall average latency for all requests, not the average per server for only status=200 requests. Option C is wrong because it applies the where status=200 filter after the stats command, which has already aggregated data across all status codes, so the filter has no effect on the computed averages.

Practice this question →

MCQeasy

A user wants to calculate the average response time per user, but only for users who have more than 10 events. Which search approach is efficient?

A.index=web | eventstats avg(response_time) as avg by user | where count>10

B.index=web | stats avg(response_time) as avg, count as cnt by user | where cnt>10

C.index=web | where count>10 | stats avg(response_time) by user

D.index=web | stats avg(response_time) as avg by user | where count>10

AnswerB

Computes both statistics and filters correctly.

Why this answer

Option B is correct because it first uses `stats` to compute both the average response time and the event count per user, then filters with `where cnt>10` to keep only users who have more than 10 events. This ensures the average is calculated only after grouping, and the count condition is applied on the aggregated result, which is efficient and accurate.

Exam trap

The trap here is that candidates often confuse `eventstats` with `stats` and think they can filter on an aggregated field like `count` without first computing it in the same `stats` command, leading them to choose Option A or D.

How to eliminate wrong answers

Option A is wrong because `eventstats` adds the average and count to each raw event without reducing the dataset, and then `where count>10` filters events rather than users, so it does not correctly isolate users with more than 10 events. Option C is wrong because `where count>10` is applied before any aggregation, but `count` is not a field in raw events, so this will return no results or an error. Option D is wrong because `stats avg(response_time) by user` computes only the average per user, discarding the count, so `where count>10` cannot reference the count field, causing the search to fail or produce incorrect results.

Practice this question →

MCQmedium

A security analyst runs `index=network sourcetype=firewall | stats count by src_ip | sort - count | head 10` to find the top 10 source IPs by event count. The search returns only 5 results. Which of the following is the most likely reason?

A.The search time range is too short, so only a few events are counted.

B.The sort command should be `sort - count` without space.

C.The stats command should include a by clause with count in the field list.

D.There are fewer than 10 unique source IPs in the results.

AnswerD

If the number of distinct src_ip values is less than 10, head 10 returns all of them, resulting in fewer than 10 rows.

Why this answer

Option D is correct because the `stats count by src_ip` command groups events by each unique source IP address and counts them. If the search returns only 5 results, it means there are only 5 unique source IPs in the dataset matching the time range and filters. The `head 10` command then limits output to 10 rows, but since only 5 groups exist, only 5 rows are returned.

Exam trap

The trap here is that candidates assume `head 10` always returns 10 results, forgetting that `head` limits the number of output rows from the preceding command, which may already have fewer rows than the limit.

How to eliminate wrong answers

Option A is wrong because a short time range would reduce the total event count, but the `stats count by src_ip` command still groups by unique IPs; if there are more than 10 unique IPs, the search would return 10 results regardless of total event count. Option B is wrong because `sort - count` with a space is valid syntax in SPL; the space between the dash and the field name is optional and does not cause the command to fail. Option C is wrong because the `stats count by src_ip` command already includes `count` as the aggregation function and `src_ip` as the grouping field; there is no requirement to list `count` in the `by` clause.

Practice this question →

MCQeasy

An analyst wants to identify the top 5 user agents that generated the most 404 errors in the last 24 hours. Which search accomplishes this correctly and efficiently?

A.index=web status=404 | top limit=5 user_agent

B.index=web | top limit=5 user_agent status=404

C.index=web | top limit=5 user_agent

D.index=web | stats count by user_agent | where status=404 | top 5 user_agent

AnswerA

Correctly filters for 404 errors and efficiently returns top 5 user agents using the top command.

Why this answer

Option A is correct because it first filters events to only those with status=404, then uses the `top` command with `limit=5` to efficiently count and rank user_agent values. This ensures the search only processes relevant events, minimizing resource usage and returning the correct top 5 user agents for 404 errors.

Exam trap

Splunk often tests the order of operations in Splunk SPL, specifically that filtering commands like `status=404` must precede statistical commands like `top` or `stats` to ensure the aggregation is performed only on the subset of interest, not on the entire dataset.

How to eliminate wrong answers

Option B is wrong because the `top` command processes fields in the order they are listed; placing `user_agent` before `status=404` means it will count user_agent values across all events, then apply the status=404 filter as a secondary field, which does not restrict the count to only 404 errors. Option C is wrong because it omits the status=404 filter entirely, returning the top 5 user agents across all HTTP status codes, not just 404 errors. Option D is wrong because the `where status=404` clause is placed after the `stats count by user_agent` command, which already aggregated data without the status filter; at that point, the `status` field is no longer available in the results, causing the search to fail or return no results.

Practice this question →

Matchingmedium

Match each Splunk search operator to its behavior.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Pipes output of one command to the next

Excludes events that match the following term

Matches events that contain either term

Matches events that contain both terms (default)

Groups terms to control evaluation order

Why these pairings

Operators control how search terms are combined and piped.

Practice this question →

MCQmedium

A security analyst needs to find the top 10 users with the most failed login attempts from the linux_secure sourcetype. Which SPL command is most efficient for this task?

A.index=main sourcetype=linux_secure "Failed password" | top limit=10 user

B.index=main sourcetype=linux_secure "Failed password" | stats count by user | sort 10 -count

C.index=main sourcetype=linux_secure "Failed password" | stats count by user | sort -count | head 10

D.index=main sourcetype=linux_secure | regex _raw="Failed password" | stats count by user | top limit=10

AnswerA

The `top` command is optimized for finding top values and is efficient for this scenario.

Why this answer

Option A is correct because the `top` command in SPL is specifically designed to return the most frequent values of a field, and the `limit=10` parameter directly restricts the output to the top 10 results. This approach is more efficient than using `stats count` followed by `sort` and `head` because `top` performs the aggregation and ranking in a single operation, reducing processing overhead. The search also correctly filters for 'Failed password' events within the `linux_secure` sourcetype, ensuring only failed login attempts are considered.

Exam trap

Splunk often tests the misconception that `stats count by user | sort -count | head 10` is functionally equivalent to `top limit=10 user`, but the trap is that `top` is more efficient and is the idiomatic Splunk command for this task, while the multi-command approach is less optimal and may be penalized in performance-sensitive scenarios.

How to eliminate wrong answers

Option B is wrong because `sort 10 -count` is invalid syntax; the `sort` command requires the field name and direction (e.g., `sort -count`), and the limit must be applied via `head` or the `limit` parameter in `top`. Option C is wrong because while it produces the correct result, it is less efficient than option A; it requires two separate commands (`stats` then `sort` then `head`) instead of the single `top` command, and the `head 10` is redundant if `top limit=10` is used. Option D is wrong because it uses `regex _raw="Failed password"` instead of a simple search term, which is less efficient; Splunk's indexed search for a literal string is faster than applying a regex to the raw event data, and the `top limit=10` at the end is redundant since `stats count by user` already aggregated the data, making the `top` command unnecessary.

Practice this question →

MCQhard

A Splunk admin wants to track the number of unique users who accessed a system each hour over the past 24 hours. Which search provides the correct result?

A.index=main earliest=-24h | timechart span=1h dc(user) as unique_users

B.index=main earliest=-24h | timechart span=1h values(user)

C.index=main earliest=-24h | stats dc(user) by _time | timechart span=1h dc(user)

D.index=main earliest=-24h | timechart span=1h count by user

AnswerA

dc(user) gives distinct count of users per hour with timechart.

Why this answer

Option A is correct because it uses `timechart span=1h dc(user)` to count distinct users per hour over the last 24 hours. The `dc()` function calculates distinct counts, and `span=1h` sets the time bucket to one hour, exactly matching the requirement.

Exam trap

The trap here is confusing `count` (total events) with `dc()` (distinct values), and assuming `values()` or `count by user` can produce a unique user count per time period.

How to eliminate wrong answers

Option B is wrong because `values(user)` returns a multivalue list of users per hour, not a count of unique users. Option C is wrong because `stats dc(user) by _time` groups by raw event timestamps, not hourly buckets, and then `timechart` cannot properly aggregate pre-grouped data, leading to incorrect results. Option D is wrong because `count by user` counts events per user per hour, not the number of unique users; it produces a separate series for each user rather than a single count of distinct users.

Practice this question →

Multi-Selecthard

Which THREE of the following are valid uses of the stats command? (Select three.)

Select 3 answers

A.Calculating the average of a field across all events.

B.Finding the earliest timestamp for each category.

C.Grouping events by a categorical field and counting them.

D.Creating a time-based chart with multiple series.

E.Enriching events with fields from an external lookup.

AnswersA, B, C

Stats avg() computes average

Why this answer

The `stats` command in Splunk is used to perform statistical aggregations on search results. Option A is correct because `stats avg(field)` calculates the arithmetic mean of a specified field across all events in the result set. Option B is correct because `stats earliest(_time) by category` returns the minimum timestamp for each distinct value of the category field, which is a standard use of the `earliest()` function.

Option C is correct because `stats count by category` groups events by the categorical field and returns the number of events in each group, a fundamental aggregation pattern.

Exam trap

Splunk often tests the distinction between `stats` and `timechart`; the trap here is that candidates see 'time-based chart' and incorrectly assume `stats` can produce it, but `timechart` is the only command that automatically bins events into time buckets and supports multiple series via the `by` clause.

Practice this question →

100

Multi-Selecthard

Which TWO of the following statements about the `transaction` command are true? (Choose two.)

Select 2 answers

A.The transaction command can only be used on events that have a timestamp.

B.The transaction command uses a sliding time window to detect transaction boundaries.

C.The transaction command can group events that share a common field value, such as a session ID.

D.The transaction command adds fields such as `duration` and `eventcount` to each transaction.

E.The transaction command removes all fields except those specified in the `fields` argument.

AnswersC, D

Transaction can group by shared field values.

Why this answer

Option C is correct because the `transaction` command is designed to group events that share a common field value, such as a session ID, allowing you to correlate related events into a single transaction. This is a core use case for tracking user sessions or multi-step processes where events are linked by a shared identifier.

Exam trap

Splunk often tests the misconception that the `transaction` command uses a sliding time window, but in reality it uses a fixed or pause-based window, and candidates confuse this with the sliding window behavior of commands like `streamstats` or `timechart`.

Practice this question →

101

MCQhard

A search uses a subsearch to retrieve a list of user IDs, and then the main search uses IN operator to filter events. The subsearch is expected to return up to 10,000 values. What is a potential limitation and how can it be addressed?

A.The subsearch returns only 10,000 results by default; use | head 50000 in subsearch.

B.The subsearch default limit is 50,000; no change needed.

C.The subsearch default limit is 10,000; to include more, use the | fields values command in the subsearch to return all values.

D.The subsearch default limit is 100,000; no change needed.

AnswerC

Fields values collapses duplicates and can exceed row limit

Why this answer

Option C is correct because the default limit for results returned by a subsearch in Splunk is 10,000. When using the `IN` operator in the main search, the subsearch must provide all necessary values; if more than 10,000 values are expected, the `| fields values` command can be used in the subsearch to override this limit and return all distinct values, as it bypasses the default result count restriction.

Exam trap

The trap here is that candidates often confuse the default subsearch result limit (10,000) with the main search result limit (50,000) or assume that increasing the limit with `| head` is the correct solution, when in fact the `| fields values` command is the proper method to return all values from a subsearch without hitting the row limit.

How to eliminate wrong answers

Option A is wrong because the default subsearch limit is 10,000, not 10,000 results by default that can be increased with `| head 50000`; using `| head` would only limit results further, not expand them, and the correct approach is to use `| fields values` to return all values. Option B is wrong because the default subsearch limit is 10,000, not 50,000; stating no change is needed is incorrect when the subsearch is expected to return up to 10,000 values, as this is exactly the default limit and may still be insufficient if the subsearch returns exactly 10,000 values (the limit is applied before the subsearch completes). Option D is wrong because the default subsearch limit is 10,000, not 100,000; no change is needed is also incorrect for the same reason as option B.

Practice this question →

102

MCQmedium

A large e-commerce company uses Splunk to monitor their web application. They have a query that uses the transaction command to group related events into transactions based on session ID and a 30-minute max pause. The query runs slowly and often times out. The environment has 10 indexers with 4 CPU cores each. The search is run over the last 7 days. Which of the following is the best course of action to improve performance?

A.Use the eval command to create a transaction ID field and then use stats to group events.

B.Reduce the max pause to 15 minutes to limit the number of events in each transaction.

C.Replace the transaction command with a combination of stats and streamstats commands.

D.Increase the number of indexers to 20 to distribute the load.

AnswerC

Using stats and streamstats is more efficient than transaction and can achieve similar grouping results.

Why this answer

The `transaction` command is resource-intensive because it groups events by a field (session ID) and a max pause, requiring significant memory and processing to correlate events across the entire search time range. Replacing it with `stats` and `streamstats` is more efficient because `stats` can aggregate events by session ID without the overhead of transaction boundaries, and `streamstats` can compute running totals or windows within each session, leveraging distributed processing across indexers. This approach reduces memory pressure and avoids the timeout issue by using streaming operations that scale better with large datasets.

Exam trap

Splunk often tests the misconception that reducing the max pause or adding hardware (more indexers) is the best fix, when the real issue is replacing the inefficient `transaction` command with more scalable streaming commands like `stats` and `streamstats`.

How to eliminate wrong answers

Option A is wrong because using `eval` to create a transaction ID field and then `stats` to group events does not inherently improve performance; it still requires a similar grouping operation and does not address the core inefficiency of the `transaction` command's memory overhead. Option B is wrong because reducing the max pause to 15 minutes may limit transaction size but does not fundamentally reduce the computational cost of the `transaction` command, which still must evaluate event boundaries and maintain state for each session across the entire search window. Option D is wrong because increasing the number of indexers to 20 distributes the search load but does not optimize the query itself; the `transaction` command's performance bottleneck is often in the search head's memory and processing, not just indexing capacity, and adding indexers may not resolve timeouts if the command is inherently inefficient.

Practice this question →

103

MCQmedium

A security analyst needs to find all events where the field `status` has a value of either "error" or "critical" and the field `bytes` is greater than 1000. Which search correctly accomplishes this?

A.(status=error OR status=critical) bytes>1000

B.status=error OR status=critical AND bytes>1000

C.status IN (error, critical) AND bytes>1000

D.status="error" OR status="critical" bytes>1000

AnswerA

Parentheses ensure the OR is evaluated first, and then the AND with bytes>1000.

Why this answer

Option A is correct because in Splunk's Search Processing Language (SPL), parentheses group the OR conditions to ensure they are evaluated together, and the space between the grouped condition and `bytes>1000` acts as an implicit AND. This correctly retrieves events where `status` is either "error" or "critical" AND `bytes` is greater than 1000.

Exam trap

The trap here is that Splunk's implicit AND (space) combined with operator precedence causes candidates to forget that OR conditions must be grouped with parentheses to avoid unintended logic, leading them to choose Option B or D.

How to eliminate wrong answers

Option B is wrong because without parentheses, AND has higher precedence than OR, so it is parsed as `status=error OR (status=critical AND bytes>1000)`, which returns events with status=error regardless of bytes, plus events matching the AND condition. Option C is wrong because the `IN` operator in Splunk requires the field name to be on the left and a parenthesized list of values, but the syntax `status IN (error, critical)` is invalid; the correct syntax is `status IN ("error", "critical")` with quoted strings. Option D is wrong because it omits parentheses around the OR conditions, causing the implicit AND to bind more tightly to the second condition, resulting in the same precedence issue as Option B.

Practice this question →

104

MCQhard

Refer to the exhibit. What does the final result represent?

A.Users who log on more than twice on average.

B.Hours where the total logon count is more than double the average.

C.Hours where any user's logon count is more than double the average for that hour.

D.Users who have a logon count greater than twice their personal average.

AnswerC

Correct: per hour, per user comparison to hour average

Why this answer

The `eventstats` command calculates a per-hour average logon count across all users. The `where` clause then filters for events where a specific user's logon count for that hour is more than double that hourly average. This directly matches option C: hours where any user's logon count exceeds twice the average for that hour.

Exam trap

The trap here is that candidates confuse `eventstats ... by hour` (which computes a global average per hour) with a per-user average, leading them to incorrectly select option D or A.

How to eliminate wrong answers

Option A is wrong because the query does not compute a per-user average across all hours; it compares each user's hourly count to the hourly average, not a user's average. Option B is wrong because the comparison is against the average logon count for that specific hour, not the total logon count for the hour; the `where` clause checks `logon_count > 2 * avg_logons`, which is a per-user value, not a total. Option D is wrong because the average used is the hourly average across all users, not the user's own personal average; `eventstats` with `by hour` computes a global average per hour, not per user.

Practice this question →

105

MCQhard

The search returns zero results, but the lookup file contains users with names like 'admin1', 'admin2'. What is the most likely reason?

A.The lookup file is not in CSV format.

B.The 'like' function requires a wildcard pattern with '%' but the field value may have leading/trailing spaces or the pattern is case-sensitive.

C.The stats command only counts events where role=admin, but the role field is already filtered.

D.The search command runs before the eval command.

AnswerB

like() is case-sensitive; also if user has spaces, pattern may not match.

Why this answer

The 'like' function in Splunk uses SQL-style pattern matching where '%' matches any sequence of characters. If the lookup file contains 'admin1' and 'admin2', but the search uses 'like(role, "admin%")', leading/trailing spaces in the field values or case sensitivity (e.g., 'Admin1' vs 'admin1') would cause the pattern to fail, returning zero results. Option B correctly identifies this as the most likely reason because Splunk's 'like' is case-sensitive by default and does not trim spaces.

Exam trap

Splunk often tests the misconception that 'like' is case-insensitive or automatically handles spaces, leading candidates to overlook the need for explicit trimming or case normalization.

How to eliminate wrong answers

Option A is wrong because Splunk lookups can be in CSV format or other formats like KV store; a non-CSV format would cause a different error (e.g., 'Error opening lookup file'), not silently return zero results. Option C is wrong because the stats command counts events based on the filtered results; if the role field is already filtered to only admin values, stats would still count those events, not return zero. Option D is wrong because the search command runs before the eval command in the pipeline order, but that does not cause zero results; the eval command would still process the filtered events correctly.

Practice this question →

106

MCQhard

A search `index=main | eval weekday=strftime(_time,"%A") | stats count by weekday | sort - count` shows that Monday has the highest count. However, the user suspects that Monday data is double-counted due to timezone offset. What should be done to investigate?

A.Use `date_wday` field which is based on the local time by default if configured.

B.Use `strftime(_time,"%w")` instead of %A to avoid string comparison issues.

C.Apply `| convert timeformat="%A" tz=US/Mountain _time as weekday` to adjust timezone.

D.Use `eval weekday=strftime(_time + timezone_offset, "%A")` with a fixed offset.

AnswerA

`date_wday` is automatically generated by Splunk based on the configured timezone in the source type.

Why this answer

Option A is correct because `_time` is in UTC; if the events are from timezones where Monday starts earlier or later, using `date_wday` from the local time conversion is more accurate. Option B is wrong because `date_hour` is not needed. Option C is wrong because using `strftime` with timezone is possible but not the most direct.

Option D is wrong because converting to epoch does not help.

Practice this question →

107

Multi-Selecteasy

Which THREE of the following are valid Splunk search commands?

Select 3 answers

A.regex

B.dedup

C.sort

D.filter

E.parse

AnswersA, B, C

`regex` is a valid command to filter events using a regular expression.

Why this answer

The `regex` command is a valid Splunk search command that filters search results by applying a Perl-compatible regular expression (PCRE) to raw events or specific fields. It is commonly used to extract or match patterns within event data, such as IP addresses or error codes, and is distinct from the `rex` command which extracts fields.

Exam trap

Splunk often tests the distinction between real Splunk commands and plausible-sounding but non-existent commands like `filter` or `parse`, which candidates might confuse with similar functions in other tools or programming languages.

Practice this question →

108

MCQeasy

A security analyst needs to find all events where the field 'status' is either 'error' or 'critical', and then count the number of events per source IP. Which search is correct?

A.index=security (status=error OR status=critical) | stats count by src_ip

B.index=security status=error AND status=critical | stats count by src_ip

C.index=security | where status=error OR status=critical | stats count by src_ip

D.index=security status=error OR status=critical | stats count by src_ip

AnswerA

Correct syntax: parentheses group OR conditions, then stats count.

Why this answer

Option A is correct because it uses the proper syntax to filter events where the 'status' field is either 'error' or 'critical' within the index, and then pipes the results into the stats command to count events by 'src_ip'. The parentheses around the OR condition ensure correct evaluation order, and the stats count by src_ip accurately aggregates the count per source IP.

Exam trap

Splunk often tests the importance of parentheses in OR conditions within Splunk searches, as candidates commonly assume that 'status=error OR status=critical' without parentheses works the same as with parentheses, but it can lead to unintended search behavior due to operator precedence.

How to eliminate wrong answers

Option B is wrong because it uses 'AND' between the two status conditions, which would require an event to have both 'error' AND 'critical' simultaneously in the same field, which is impossible and returns zero results. Option C is wrong because it uses the 'where' command after the initial index filter, which is less efficient and not necessary; the 'where' command is typically used for more complex expressions, but here the OR condition can be handled directly in the search string. Option D is wrong because it lacks parentheses around the OR condition, which can lead to incorrect evaluation order; without parentheses, the search might be interpreted as 'index=security status=error' OR 'status=critical', potentially returning events from other indexes if 'status=critical' matches elsewhere.

Practice this question →

109

MCQeasy

Which SPL command can be used to create a new field based on a conditional evaluation, such as setting a status field to 'critical' if a numeric threshold is exceeded?

A.| makemv

B.| rex field=_raw

C.| eval status=if(value>100,"critical","normal")

D.| convert status=if(value>100,"critical","normal")

AnswerC

Eval with if performs conditional assignment

Why this answer

The `eval` command in SPL is used to create new fields or modify existing ones by evaluating expressions. The `if()` function within `eval` allows conditional logic, making `| eval status=if(value>100,"critical","normal")` the correct syntax to create a new field 'status' that is set to 'critical' when the numeric field 'value' exceeds 100, and 'normal' otherwise.

Exam trap

Splunk often tests the distinction between `eval` (for field creation and computation) and `convert` (for data type conversion), leading candidates to mistakenly choose `convert` for conditional logic due to its similar syntax.

How to eliminate wrong answers

Option A is wrong because `makemv` is used to split a single multivalue field into separate values, not to create a field based on conditional evaluation. Option B is wrong because `rex field=_raw` is used for extracting fields using regular expressions from the `_raw` event data, not for conditional field creation. Option D is wrong because `convert` is used for type conversion (e.g., converting strings to numbers or timestamps), not for conditional logic; the syntax `convert status=if(...)` is invalid and would produce an error.

Practice this question →

110

Drag & Dropmedium

Arrange the steps to create a new index in Splunk in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating an index involves navigating to the indexes page, adding a new index with appropriate settings, and saving.

Practice this question →

111

MCQhard

A Splunk administrator runs the following search to identify the top 5 users by total bytes transferred: index=proxy sourcetype=webproxy | stats sum(bytes) as total_bytes by user | sort - total_bytes | head 5 The search returns results, but the numbers seem inflated. On closer inspection, the 'bytes' field is a string type. What must be done to correct the search?

A.Use 'convert num(bytes)' before stats.

B.Use 'eval bytes_numeric = tonumber(bytes)' then 'stats sum(bytes_numeric) as total_bytes by user'.

C.Use 'where isnum(bytes)' to filter out non-numeric values before stats.

D.Use 'eval bytes = string(bytes)' before stats.

AnswerB

This explicitly converts the string to numeric, ensuring correct summation.

Why this answer

Option B is correct because the `bytes` field is stored as a string, and `stats sum()` cannot perform arithmetic on string values — it would silently treat them as zero or concatenate them, leading to inflated results. The `tonumber()` function explicitly converts the string to a numeric type, enabling accurate summation. Using `eval` to create a new numeric field before `stats` is the standard approach in Splunk for this scenario.

Exam trap

The trap here is that candidates assume `stats sum()` automatically converts strings to numbers, or they reach for `convert` (a non-existent command) instead of the correct `eval tonumber()` pattern, which Splunk explicitly tests in the Advanced Searching domain.

How to eliminate wrong answers

Option A is wrong because `convert num(bytes)` attempts to convert the field in place, but `convert` is not a valid Splunk command for this purpose; the correct command is `eval` with `tonumber()`. Option C is wrong because `where isnum(bytes)` filters out non-numeric values but does not convert the string to a number, so `stats sum()` would still fail to sum correctly (strings would be ignored or cause errors). Option D is wrong because `eval bytes = string(bytes)` explicitly converts the field to a string, which is the opposite of what is needed and would make the inflation worse.

Practice this question →

112

MCQmedium

Which command returns the list of all sourcetypes in a specific index?

A.| sourcetype count index=main

B.| eventtype count index=main

C.| metasearch index=main sourcetype=*

D.| metadata type=sourcetypes index=main

AnswerD

`metadata` with `type=sourcetypes` lists all sourcetypes in the index.

Why this answer

Option D is correct because the `| metadata` command with `type=sourcetypes` retrieves a list of all sourcetypes present in a specified index, along with their earliest and latest timestamps. This command queries the index metadata directly, making it the appropriate tool for listing sourcetypes within a given index.

Exam trap

Splunk often tests the distinction between commands that return metadata summaries (`| metadata`) versus commands that return raw events or statistical aggregations, leading candidates to choose `| metasearch` or malformed `| sourcetype count` commands instead of the correct metadata approach.

How to eliminate wrong answers

Option A is wrong because `| sourcetype count` is not a valid SPL command; it appears to be a malformed attempt to use `| stats count by sourcetype`, which would count events per sourcetype but not list all sourcetypes in an index. Option B is wrong because `| eventtype count` is also not a valid command; eventtypes are saved searches or tags, not a direct way to list sourcetypes, and the syntax is incorrect. Option C is wrong because `| metasearch index=main sourcetype=*` is a valid search that returns events matching the pattern, but it does not return a list of distinct sourcetypes; it returns raw events, which is inefficient and not the intended output.

Practice this question →

113

MCQeasy

An analyst wants to remove events that contain the string 'debug' from a log. Which command should be used?

A.| where NOT match(_raw,"debug")

B.| search debug | reverse

C.| search "debug" NOT

D.| search NOT debug

AnswerD

Negates the search term to exclude events

Why this answer

Option D is correct because the `| search NOT debug` command filters out all events containing the string 'debug' from the result set. In Splunk, the `NOT` operator before a search term excludes events that match that term, effectively removing them from the output. This is the standard way to exclude a specific string from search results.

Exam trap

The trap here is that candidates often confuse the placement of `NOT` in Splunk syntax, thinking it can be placed after the term like in natural language, or they mistakenly use `where` with regex functions when a simple `NOT` suffices.

How to eliminate wrong answers

Option A is wrong because `| where NOT match(_raw,"debug")` uses the `match` function which expects a regex pattern, not a literal string; it would treat 'debug' as a regex, potentially causing unexpected behavior or errors if the string contains regex metacharacters. Option B is wrong because `| search debug | reverse` first includes only events with 'debug', then reverses the order, which does not remove 'debug' events but instead keeps them and changes their display order. Option C is wrong because `| search "debug" NOT` has incorrect syntax; the `NOT` operator must be placed before the term it negates, not after, and this would likely result in a syntax error or unintended results.

Practice this question →

114

MCQeasy

A security analyst notices that a timechart command is returning too many data points on the x-axis, making the chart unreadable. Which command modification should be used to reduce the number of data points?

A.| timechart partial=f count by host

B.| timechart useother=f count by host

C.| timechart span=1h count by host

D.| timechart limit=5 count by host

AnswerC

Span reduces data point granularity

Why this answer

The `timechart` command automatically bins events into time buckets based on the time range. By default, Splunk chooses a span that can result in many data points. Adding `span=1h` explicitly sets the bucket size to one hour, reducing the number of data points on the x-axis and making the chart readable.

Exam trap

The trap here is that candidates confuse options that control the number of series (like `limit` or `useother`) with options that control the number of time buckets (like `span`), leading them to pick a wrong answer that does not affect the x-axis density.

How to eliminate wrong answers

Option A is wrong because `partial=f` controls whether partial time buckets at the edges of the time range are displayed, not the number of data points. Option B is wrong because `useother=f` prevents grouping of low-count values into an 'Other' category, which affects the y-axis series, not the x-axis data points. Option D is wrong because `limit=5` restricts the number of series (e.g., top 5 hosts) shown, not the number of time buckets on the x-axis.

Practice this question →

115

MCQmedium

The exhibit shows a search that categorizes HTTP status codes and counts them. If the search returns only three categories, what is the most likely reason?

A.The stats command is filtering out events with null category.

B.The case function has a syntax error that truncates results.

C.The case statement does not cover status codes above 599.

D.Some categories have zero events and are not displayed by default.

AnswerD

stats count by category only shows categories with non-zero counts unless usenull is specified.

Why this answer

Option D is correct because the `stats` command in Splunk, by default, only returns results for categories that have at least one event. If a category (e.g., a specific HTTP status code range) has zero matching events, it will not appear in the output. This is a common behavior in aggregation commands, where null or zero-count results are suppressed unless explicitly requested with the `usenull=f` or `fillnull` options.

Exam trap

Splunk often tests the default behavior of `stats` to omit zero-count groups, leading candidates to incorrectly assume that the `case` function is incomplete or that events are being filtered out, rather than recognizing that empty categories are simply not displayed.

How to eliminate wrong answers

Option A is wrong because the `stats` command does not filter out events with null category; it simply does not display categories with zero counts. The `case` function returns a null value for unmatched conditions, but `stats` counts those events under a null category only if `useother=t` or `usenull=t` is specified. Option B is wrong because a syntax error in the `case` function would cause the search to fail entirely or return an error, not truncate results to exactly three categories.

Option C is wrong because HTTP status codes above 599 are not valid per RFC 7231, and the `case` statement is not required to cover them; the question states the search returns only three categories, implying the `case` statement covers all valid codes, but zero events exist for some ranges.

Practice this question →

116

MCQhard

A security analyst wants to create a comparison report showing the count of login failures by user for today versus yesterday. They run: `index=security action=failure | timechart count by user`. This produces a chart of counts over time, but they want separate columns for today and yesterday. How can they achieve this comparison efficiently?

A.Use `| append [search index=security action=failure earliest=-2d@d latest=-1d@d | eval period="yesterday"] | timechart count by user by period`.

B.Use `| eval day=if(_time>=relative_time(now(),"@d"),"today","yesterday") | timechart count by user by day`.

C.Use `| stats count by user _time | xyseries _time user count`.

D.Use `| timechart count by user useother=t` with the time range set to 'Yesterday' and 'Today' in the time picker.

AnswerB

Correctly categorizes events by day and creates separate columns.

Why this answer

Using `eval` to create a day label and then `timechart` with the user and day fields creates the desired side-by-side chart. Option A is incorrect because timechart does not have a 'useother' option for this. Option C works but is less efficient and may require manual time ranges.

Option D does not produce a time-based comparison.

Practice this question →

117

MCQmedium

When using the stats command with multiple BY fields, the results show many rows with null values. What is the most likely cause and how can it be reduced?

A.Use | where command to filter out null values

B.Use | stats ... by ... usenull=f

C.Use | eval to replace nulls before stats

D.Use | fillnull value=0 outputfield=count after stats

AnswerB

Prevents null groups from appearing

Why this answer

Option B is correct because the `stats` command includes null values in BY fields by default, which can produce many rows with nulls. Using `usenull=f` explicitly tells `stats` to ignore null values in the BY clause, reducing those rows. This parameter is specific to the `stats` command and directly addresses the root cause.

Exam trap

The trap here is that candidates often confuse `usenull=f` with post-processing filters like `where` or `fillnull`, not realizing that the null rows are generated during the `stats` aggregation itself and must be prevented at that stage.

How to eliminate wrong answers

Option A is wrong because the `where` command filters results after `stats` has already processed nulls, which does not reduce the number of rows generated by `stats`; it only hides them from the output. Option C is wrong because using `eval` to replace nulls before `stats` changes the data (e.g., replacing null with a placeholder like 'N/A'), which can alter statistical results and is not the intended way to handle nulls in BY fields. Option D is wrong because `fillnull` is used after `stats` to replace null values in output fields, not to prevent null rows from being created by the BY clause.

Practice this question →

118

MCQeasy

An analyst wants to calculate the average response time for each web server, but only for requests that returned status code 200. Which search accomplishes this?

A.index=web sourcetype=access status=200 | sort host | stats avg(response_time)

B.index=web sourcetype=access | eval avg_time=avg(response_time) by host | where status=200

C.index=web sourcetype=access status=200 | stats avg(response_time) by host

D.index=web sourcetype=access | stats avg(response_time) by host | search status=200

AnswerC

Correct order: filter, then stats.

Why this answer

Option C is correct because it first filters events with `status=200` (only successful requests), then uses `stats avg(response_time) by host` to compute the average response time per web server. This ensures the aggregation is performed only on the relevant subset of data, matching the requirement precisely.

Exam trap

Splunk often tests the order of operations in Splunk searches, specifically that filtering (with `where` or search terms) must occur before aggregation (`stats`) to affect the computed values, and that `eval` cannot perform aggregate functions like `avg()`.

How to eliminate wrong answers

Option A is wrong because `sort host` before `stats` is unnecessary and does not affect the aggregation; more critically, `stats avg(response_time)` without a `by` clause computes a single overall average, not per host. Option B is wrong because `eval` cannot compute an aggregate function like `avg()` with a `by` clause; `eval` is for per-event calculations, not statistical aggregations, and the `where` clause is placed after the invalid `eval`. Option D is wrong because `stats avg(response_time) by host` is computed on all events (including non-200 status codes), and then `search status=200` attempts to filter after aggregation, but the `status` field is no longer present in the aggregated results, so the filter will return no results or be meaningless.

Practice this question →

119

MCQeasy

To count events by host for the last hour, which search is most efficient?

A.index=* earliest=-1h | stats count by host

B.index=* | stats count by host | where _time > relative_time(now(), "-1h")

C.search index=* | head 1000 | stats count by host

D.sourcetype=access_combined | timechart count by host

AnswerA

Applies time range early, minimizing data scanned.

Why this answer

Option A is correct because it uses `index=*` to search all indexes and `earliest=-1h` to restrict the search to the last hour at the index level, which is the most efficient way to filter time. The `stats count by host` then aggregates counts per host without needing to process events outside the time range. This approach leverages Splunk's time-based index pruning, minimizing data scanned.

Exam trap

Splunk often tests the misconception that you can filter time after aggregation (as in Option B) or that limiting results with `head` is equivalent to time-based filtering, when in fact time filters must be applied at search time via `earliest`/`latest` for efficiency and correctness.

How to eliminate wrong answers

Option B is wrong because it retrieves all events (no time filter) and then attempts to filter by `_time` after the `stats` command, which is inefficient and incorrect since `stats` discards the `_time` field unless explicitly retained; the `where` clause would fail or require reprocessing all data. Option C is wrong because `head 1000` arbitrarily limits results to the first 1000 events, which may not represent the last hour and can miss relevant data, making it both inefficient and inaccurate. Option D is wrong because `sourcetype=access_combined` restricts to a specific sourcetype, not all events, and `timechart count by host` is less efficient than `stats` for a simple count by host, as it creates time-based buckets unnecessarily.

Practice this question →

120

MCQeasy

A user wants to see the top 5 most common HTTP methods (field "method") from web access logs, along with their percentage of total. Which search is best?

A.index=web | top method countfield=percent

B.index=web | eventstats count | top method

C.index=web | top method limit=5 showperc=t

D.index=web | stats count by method | sort - count | head 5

AnswerC

Correctly uses top with showperc to display percentages.

Why this answer

Option C is correct because `top` with `limit=5` returns the five most common values of the `method` field, and `showperc=t` automatically calculates and displays each value's percentage of the total events. This directly meets the requirement to see the top 5 HTTP methods and their percentages without needing additional commands.

Exam trap

The trap here is that candidates often assume `top` only shows counts and not percentages, or they misuse `countfield` instead of `showperc`, leading them to choose a manual `stats` approach that omits the percentage calculation entirely.

How to eliminate wrong answers

Option A is wrong because `countfield=percent` is not a valid parameter for the `top` command; the correct parameter to display percentages is `showperc=t`. Option B is wrong because `eventstats count` adds a total count to every event, but `top` without `limit=5` defaults to showing 10 results, and it does not automatically calculate percentages unless `showperc=t` is used. Option D is wrong because while it correctly finds the top 5 methods by count, it does not calculate or display the percentage of total for each method, which the question explicitly requires.

Practice this question →

121

Multi-Selectmedium

Which TWO of the following are valid aggregation functions in the `stats` command? (Choose 2)

Select 2 answers

A.median

B.sum

C.earliest

D.list

E.distinct_count

.last

AnswersC, D

`earliest` is a valid stats function that returns the earliest value of a field.

Why this answer

The `stats` command in Splunk supports `earliest()` as an aggregation function that returns the earliest value of a field for each group. Option C is correct because `earliest()` is a valid stats function that retrieves the first occurrence of a field value within the search results, based on the order of events.

Exam trap

Splunk often tests the distinction between valid `stats` functions and those that are only available in `eventstats` or `streamstats`, such as `median()` and `mode()`, leading candidates to incorrectly select them for `stats`.

Practice this question →

122

MCQeasy

Refer to the exhibit. What will this search return?

A.A list of events with status 404.

B.A time-based chart with a line for each host showing count of 404 events per time period.

C.A table with columns for each host and a row for each time bucket showing count of 404 errors.

D.A bar chart of total 404 errors per host.

AnswerB

timechart by host produces a time series chart with lines per host.

Why this answer

The search uses `timechart` with `by host`, which produces a time-based chart where each host is a separate series (line) showing the count of events where `status=404` over each time bucket. The `count` function aggregates the number of 404 events per time period, and the `by host` clause splits the results into separate lines per host. Option B correctly describes this output.

Exam trap

Splunk often tests the distinction between `timechart` (time-based series) and `chart` or `stats` (non-time-based aggregation), leading candidates to confuse a time-series chart with a static table or bar chart.

How to eliminate wrong answers

Option A is wrong because the search does not return a raw list of events; it aggregates counts over time using `timechart`, so individual events are not displayed. Option C is wrong because `timechart` produces a time-based chart (line or column) with time on the x-axis, not a table with rows for each time bucket and columns for each host; a table would require `chart` or `stats` with `by` and `span`. Option D is wrong because `timechart` with `by host` does not produce a bar chart of total counts per host; it shows counts over time, not a single aggregated total per host.

Practice this question →

123

MCQhard

A search produces a field 'count'. You need to find the event with the maximum count. Which approach is correct?

A.| eventstats max(count) as maxcount | where count = maxcount

B.Both B and C work.

C.| sort -count | head 1

D.| stats max(count) as maxcount

AnswerA

This adds the maximum to each event and filters to those that equal the max.

Why this answer

Option A is correct because it uses `eventstats` to compute the maximum count across all events, storing it in a new field `maxcount`, and then filters the events where the original `count` equals that maximum. This approach preserves the full event data for the event(s) with the highest count, which is necessary when you need to retrieve the entire event, not just the aggregated value.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats`, where candidates mistakenly think `stats` can be used to find the event with the maximum value, but `stats` collapses the data and loses the original event fields, making it unsuitable for retrieving the full event.

How to eliminate wrong answers

Option B is wrong because it is a meta-option that claims both B and C work, but Option D does not work for finding the event with the maximum count (it only returns the max value as a single row, losing the event context). Option C is wrong because while `| sort -count | head 1` does return the event with the highest count, it is not the only correct approach; Option A is also correct, and the question asks for 'which approach is correct' — both A and C are valid, but Option B incorrectly claims that both B and C work (B is not a valid approach itself). Option D is wrong because `| stats max(count) as maxcount` produces a single-row result with only the maximum count value, not the original event data, so you cannot identify which event had that count.

Practice this question →

124

MCQeasy

Refer to the exhibit. What is the purpose of the eval command in this search?

A.It replaces the status field with the category.

B.It adds a temporary field that is not retained after stats.

C.It converts the status field to a string.

D.It creates a new field 'status_category' based on the numeric status code, grouping into three categories.

AnswerD

Correctly describes the eval case usage

Why this answer

The eval command creates a new field 'status_category' by evaluating a CASE expression that maps numeric HTTP status codes (e.g., 200, 404, 500) into three descriptive categories: 'OK', 'Client Error', and 'Server Error'. This is a common pattern for enriching raw data with human-readable labels without altering the original 'status' field. The correct answer is D because the search explicitly defines the new field based on the status code values.

Exam trap

Splunk often tests the distinction between creating a new field versus modifying an existing field, and candidates mistakenly think eval replaces the original field when it actually adds a new one.

How to eliminate wrong answers

Option A is wrong because the eval command does not replace the 'status' field; it creates a new field 'status_category' while leaving the original 'status' field intact. Option B is wrong because the new field 'status_category' is not temporary; it persists after the stats command since stats can aggregate over any existing fields, including those created by eval. Option C is wrong because the 'status' field is already a numeric type (as shown in the CASE comparisons with numbers), and eval does not convert it to a string; instead, it creates a new string field 'status_category' from the numeric values.

Practice this question →

125

MCQeasy

A search returns duplicate events for the same user. The analyst wants to keep only the first occurrence of each user based on timestamp. Which sequence of commands is best?

A.sort -_time | dedup user

B.dedup user

C.dedup user | sort _time

D.sort _time | dedup user

AnswerD

Sort ascending puts earliest first, then dedup keeps the first (earliest) per user.

Why this answer

Option D is correct because it first sorts events by timestamp in ascending order (oldest first), then applies `dedup user` to keep only the first occurrence of each user. Since `dedup` retains the first event it encounters for each field value, sorting by `_time` ensures that the earliest event for each user is kept, satisfying the requirement to keep only the first occurrence based on timestamp.

Exam trap

Splunk often tests the order of operations in piped commands, specifically that `sort` must precede `dedup` to control which event is kept, and that `-` before a field name reverses the sort order, which candidates may misinterpret.

How to eliminate wrong answers

Option A is wrong because `sort -_time` sorts in descending order (newest first), so `dedup user` would keep the most recent event for each user, not the first occurrence. Option B is wrong because `dedup user` without any sort operates on the raw order of events as they arrive from the index, which is not guaranteed to be chronological, so it may not keep the earliest event for each user. Option C is wrong because `dedup user` is applied before sorting, so the dedup operation sees events in their raw order and may discard the earliest event; the subsequent `sort _time` only reorders the remaining events but cannot recover the discarded first occurrence.

Practice this question →

126

MCQeasy

A user wants to add a field showing the average value of a numeric field `latency` for each host, without reducing the number of events. Which command should be used?

A.eval

B.stats

C.eventstats

D.streamstats

AnswerC

`eventstats` adds the average latency per host to each event without reducing the number of events.

Why this answer

The `eventstats` command is correct because it calculates aggregate statistics (like average) over a field and appends the result as a new field to every event, preserving the original event count. Unlike `stats`, which reduces the dataset to one row per group, `eventstats` enriches each event with the computed value without removing any events.

Exam trap

The trap here is that candidates often confuse `eventstats` with `stats` because both compute aggregates, but `stats` reduces events while `eventstats` does not, and Cisco tests this distinction by explicitly stating 'without reducing the number of events' in the question.

How to eliminate wrong answers

Option A is wrong because `eval` creates or modifies fields on a per-event basis using expressions, but it cannot compute aggregate statistics like an average across multiple events. Option B is wrong because `stats` computes aggregate statistics but reduces the number of events to one row per group (e.g., per host), which violates the requirement to keep all events. Option D is wrong because `streamstats` computes running or cumulative statistics over a sequence of events, not a global average per host, and it would produce incorrect results if events are not sorted properly.

Practice this question →

127

MCQmedium

The search returns unexpected results, including IP addresses that are not in the expected format (e.g., '127.0.0.1' appears as '27.0.0.1'). What is the most likely cause?

A.The regex pattern is incorrect; it should use \b for word boundaries.

B.The top command is modifying the extracted ip field.

C.The rex command must be placed before the index search.

D.The rex command extracts the first match only; some events may have multiple IPs and the first one is not the full IP.

AnswerD

If the raw contains something like '127.0.0.1' preceded by a digit, the regex might match a subset. But more likely, rex extracts first occurrence; if IP is part of a larger string, it might be incomplete.

Why this answer

Option D is correct because the `rex` command, by default, extracts only the first match of a regex pattern from each event. If an event contains multiple IP addresses, `rex` captures the first occurrence, which may be truncated if the regex pattern is not anchored properly or if the IP appears in a context where leading digits are separated (e.g., '127.0.0.1' might be preceded by a character that causes the regex to match starting at '27.0.0.1'). This is a common behavior in Splunk when using `rex` without the `max_match` parameter.

Exam trap

Splunk often tests the misconception that `rex` extracts all matches by default, leading candidates to overlook the need for `max_match` or proper regex anchoring when dealing with multiple values in a single event.

How to eliminate wrong answers

Option A is wrong because using `\b` for word boundaries would not fix the issue of extracting a truncated IP; the problem is about the first match being incomplete, not about boundary detection. Option B is wrong because the `top` command aggregates counts of field values and does not modify the extracted `ip` field itself; it only displays frequencies. Option C is wrong because the `rex` command can be placed anywhere in the search pipeline after the initial data retrieval; it does not need to be before the index search, and placing it earlier would not change the extraction behavior.

Practice this question →

128

Multi-Selecthard

Which THREE of the following are true about the `transaction` command? (Choose 3)

Select 3 answers

A.Transactions can be started based on a specific field value using the `startswith` option.

B.It outputs one event per input event, adding duration and eventcount fields.

C.The `by` clause is mandatory to define how to group events.

D.It groups events that share common field values and occur within a specified time window.

E.The `maxpause` option defines the maximum allowed time gap between events in the same transaction.

AnswersC, D, E

You must specify at least one field in the `by` clause.

Why this answer

Option C is correct because the `by` clause in the `transaction` command is mandatory. It defines the grouping criteria (e.g., `by user`, `by session_id`) that determine which events belong to the same transaction. Without a `by` clause, the command would attempt to group all events into a single transaction, which is rarely useful and often leads to incorrect results.

Exam trap

Splunk often tests the misconception that `startswith` operates on field values, when in fact it operates on raw event text, and that `transaction` outputs one event per input event rather than one event per transaction.

Practice this question →

129

MCQmedium

Which of the following searches correctly computes the average response time per host?

A.index=main | stats mean(response_time) by host

B.index=main | stats average(response_time) by host

C.index=main | eventstats avg(response_time) by host

D.index=main | stats avg response_time by host

AnswerA

`mean()` is an alias for `avg()` and correctly computes the average per host.

Why this answer

Option A is correct because the `stats` command with `mean(response_time)` calculates the arithmetic mean of the response_time field, and the `by host` clause groups the calculation per host, producing the average response time for each host. This is the standard Splunk syntax for computing averages in a grouped statistics table.

Exam trap

The trap here is that candidates may confuse `eventstats` with `stats` or use incorrect function names like `average`, leading them to choose options that either do not produce a summary table or use invalid Splunk syntax.

How to eliminate wrong answers

Option B is wrong because `average` is not a valid stats function in Splunk; the correct function name is `avg` or `mean`. Option C is wrong because `eventstats` adds the computed value as a new field to each event rather than producing a summary table, so it does not return a distinct list of hosts with their average response times. Option D is wrong because the syntax `stats avg response_time by host` is missing parentheses around the field name; Splunk requires `avg(response_time)` to correctly parse the function argument.

Practice this question →

130

MCQeasy

A developer wants to debug a slow Splunk search that uses multiple eval and where commands. The search returns correct results but takes 2 minutes. The developer wants to identify which parts of the search are slow. The environment is a single instance Splunk with moderate data. What should the developer do?

A.Manually check the search in the Job Manager after it completes.

B.Limit the time range to 1 minute and run the search.

C.Run the search with the 'search job inspector' option enabled.

D.Add comments to the search to track progress.

AnswerC

Provides per-command timing information.

Why this answer

Option C is correct because the Search Job Inspector provides detailed per-command execution statistics, including time spent, number of results, and memory usage for each pipe segment. This allows the developer to pinpoint exactly which `eval` or `where` command is causing the slowdown, without altering the search logic or time range.

Exam trap

The trap here is that candidates confuse the Job Manager (which shows high-level job status) with the Search Job Inspector (which provides granular per-command profiling), or mistakenly believe that reducing the time range or adding comments will help identify performance bottlenecks.

How to eliminate wrong answers

Option A is wrong because the Job Manager only shows overall job metadata (e.g., total run time, result count, disk usage) and does not break down performance per search command. Option B is wrong because limiting the time range to 1 minute changes the dataset size and may mask the actual slow command; it also does not provide per-command timing. Option D is wrong because comments are ignored by the search parser and have no effect on performance measurement; they do not generate any timing or profiling data.

Practice this question →

131

MCQmedium

Refer to the exhibit. Which statement about this search is true?

A.It fails because iplocation requires a lookup table to be defined.

B.It uses iplocation to add geographical information about the destination IP.

C.It only includes events where src_ip is a valid IP address.

D.It adds geographical info based on src_ip and then aggregates bytes by dest_ip and country.

AnswerD

Correct interpretation of the search

Why this answer

The search uses `iplocation` to add geographical fields (like Country, City) based on the `src_ip` field, then renames `src_ip` to `src` and uses `stats` to aggregate bytes by `dest_ip` and the newly added `Country` field. This matches option D exactly.

Exam trap

The trap here is that candidates often confuse which IP address (source vs. destination) is being geolocated, or assume `iplocation` filters out invalid IPs, when in fact it only enriches events without removing any.

How to eliminate wrong answers

Option A is wrong because `iplocation` does not require a predefined lookup table; it uses a built-in MaxMind GeoIP database. Option B is wrong because the search applies `iplocation` to `src_ip`, not the destination IP (`dest_ip`). Option C is wrong because `iplocation` does not filter events; it only adds geographical fields to events that have a valid IP in `src_ip`, but events with invalid IPs are not excluded from the search results.

Practice this question →

132

Multi-Selecteasy

Which TWO of the following commands can be used to find the most frequent value of a field within each group?

Select 2 answers

A.stats mode(field) by group

B.stats list(field) by group | eval top = mvindex('list', 0)

C.streamstats mode(field) by group

D.stats values(field) by group

E.eventstats mode(field) by group

AnswersA, E

stats mode returns the mode for each group.

Why this answer

Option A is correct because `stats mode(field) by group` directly computes the most frequent value (mode) of the specified field for each group defined by the `by` clause. The `mode()` function is specifically designed to return the value that appears most often, making it the simplest and most accurate command for this task.

Exam trap

The trap here is that candidates often confuse `list()` or `values()` with `mode()`, or incorrectly think `streamstats` can replace `stats` for grouped final aggregation, when `streamstats` is designed for cumulative calculations across events, not per-group final results.

Practice this question →

133

Multi-Selectmedium

An analyst wants to create a time-series comparison of the current week and the previous week. Which TWO commands are commonly used together to achieve this? (Select two.)

Select 2 answers

A.stats

B.timechart

C.eventstats

D.timewrap

E.appendcols

AnswersB, D

Generates time-series data

Why this answer

B is correct because `timechart` is the primary command for creating time-series aggregations, allowing you to split data into time buckets and apply statistical functions. D is correct because `timewrap` is specifically designed to compare time periods (e.g., current week vs. previous week) by wrapping the time-series data into separate series for each period, enabling side-by-side visualization.

Exam trap

Splunk often tests the misconception that `stats` or `eventstats` can replace `timechart` for time-based comparisons, but only `timechart` provides the necessary time-bucketing, and `timewrap` is the dedicated command for period-over-period wrapping.

Practice this question →

134

MCQhard

A search needs to find events where the same user logged in from more than 3 different IP addresses within a 5-minute window. Which combination of commands is most efficient?

A.`| streamstats count by user src_ip | where count > 3`

B.`| timechart span=5m limit=0 values(src_ip) by user | eval count=mvcount(values(src_ip)) | where count > 3`

C.`| stats count by user, src_ip | where count > 3`

D.`| transaction user maxspan=5m | eval distinct_ip=mvcount(src_ip) | where distinct_ip > 3`

AnswerD

Efficiently groups events by user within a 5-minute window and then counts distinct IP addresses.

Why this answer

Option D is correct because the `transaction` command groups events by `user` within a 5-minute window (`maxspan=5m`), then `eval distinct_ip=mvcount(src_ip)` counts the unique IP addresses in that transaction. This directly answers the requirement of finding users who logged in from more than 3 different IPs within a 5-minute window, and it is efficient because `transaction` handles the time-bounded grouping natively without needing to pre-aggregate or use subsearches.

Exam trap

The trap here is that candidates often choose `streamstats` or `stats` because they are familiar with counting, but they fail to realize that those commands count events per user+IP pair rather than distinct IPs per user within a time window, which is the core requirement.

How to eliminate wrong answers

Option A is wrong because `streamstats` with `count by user src_ip` counts occurrences of each user+src_ip pair, not distinct IPs per user; it would require a user to have more than 3 events from the same IP, which is not the requirement. Option B is wrong because `timechart` with `values(src_ip) by user` creates a time-based chart that can miss events if the time range is not perfectly aligned to 5-minute buckets, and it is less efficient due to the need to generate a table and then evaluate `mvcount`. Option C is wrong because `stats count by user, src_ip` counts events per user+IP pair, not distinct IPs per user within a time window; it would require a user to have more than 3 events from the same IP, and it ignores the 5-minute window entirely.

Practice this question →

135

MCQmedium

A search is producing results that include both internal and external traffic. The analyst wants to approximate the number of distinct destination IPs for internal traffic only, where internal IPs fall within the 10.0.0.0/8 range. Which approach is most efficient?

A.Use | search src_ip=10.* | stats dc(dest_ip)

B.Use | rex field=src_ip to extract first octet and then filter

C.Use | eval internal=if(cidrmatch("10.0.0.0/8", src_ip),1,0) | stats dc(dest_ip) by internal

D.Use | where cidrmatch("10.0.0.0/8", src_ip) | stats dc(dest_ip)

AnswerD

Efficient subnet matching with cidrmatch

Why this answer

Option D is correct because it uses `where cidrmatch("10.0.0.0/8", src_ip)` to efficiently filter events to only those with source IPs in the 10.0.0.0/8 range before passing them to `stats dc(dest_ip)`. This approach leverages Splunk's built-in CIDR matching function, which performs a bitwise comparison on the IP address, and applies the filter early in the pipeline, reducing the dataset for the distinct count operation. It is the most efficient as it avoids unnecessary evaluations or string operations on non-matching events.

Exam trap

The trap here is that candidates often choose Option C because they think `eval` with `by` is equivalent to filtering, but they overlook that it processes all events and computes an unnecessary group for external traffic, making it less efficient than a simple `where` filter.

How to eliminate wrong answers

Option A is wrong because `src_ip=10.*` uses a wildcard string match, which is inefficient and can match IPs like 10.0.0.1 but also incorrectly match IPs like 100.0.0.1 or 10.0.0.256 (if present), and it does not respect the subnet mask of /8; it also does not filter out external traffic before the stats command. Option B is wrong because using `rex` to extract the first octet and then filtering requires an extra parsing step and still only checks the first octet (e.g., 10.x.x.x), which does not guarantee the IP is within the 10.0.0.0/8 range (e.g., 10.255.255.255 is valid, but a simple first-octet check would also include 10.0.0.0/8 correctly, but it is less efficient and more error-prone than CIDR matching). Option C is wrong because while it uses `cidrmatch` correctly, it creates a field `internal` for every event and then uses `stats dc(dest_ip) by internal`, which computes distinct counts for both internal=1 and internal=0, wasting resources on external traffic; the analyst only wants internal traffic, so filtering with `where` is more efficient than grouping and discarding the external group.

Practice this question →

136

MCQhard

A search analyst wants to calculate the average transaction time for each user and then find users whose average transaction time exceeds the overall average. Which approach is most efficient?

A.Use eventstats to add overall average, then stats by user, then where condition

B.Use stats by user to get avg, then appendpipe to add overall avg, then eval

C.Use transaction to group events, then stats

D.Use stats by user, then eventstats to add overall avg, then where

AnswerD

Efficient: stats reduces data, eventstats adds overall average.

Why this answer

Option D is correct because it first uses `stats by user` to compute per-user average transaction times, then uses `eventstats` to append the overall average across all users to each row, allowing a direct `where` comparison. This approach is efficient because `eventstats` adds the global aggregate without requiring a separate subsearch or additional data pass, minimizing resource usage.

Exam trap

Splunk often tests the distinction between `eventstats` and `appendpipe`, where candidates mistakenly choose `appendpipe` thinking it adds a global aggregate, but it actually runs a subsearch that is less efficient and can produce incorrect results if not used carefully.

How to eliminate wrong answers

Option A is wrong because using `eventstats` before `stats by user` would compute the overall average on raw events, not on per-user averages, leading to an incorrect comparison. Option B is wrong because `appendpipe` runs a subsearch that re-scans the entire dataset, which is inefficient and redundant compared to using `eventstats` in a single pass. Option C is wrong because `transaction` is designed to group events into transactions based on session IDs or time windows, not to compute per-user averages efficiently, and it consumes significant memory and processing overhead.

Practice this question →

137

MCQmedium

A search using `tstats` to query a data model returns results but is slow. Which of the following is the most likely cause?

A.The data model contains too many fields.

B.The data model is not accelerated.

C.The search includes a `where` clause on a non-indexed field.

D.The search uses `from` instead of `index`.

AnswerB

Without acceleration, tstats runs against the raw data and can be slow.

Why this answer

When a data model is accelerated, Splink pre-computes and stores summaries of the data in a TSIDX index, allowing `tstats` to query these summaries very quickly. If the data model is not accelerated, `tstats` must scan the raw data in the index, which is significantly slower. Therefore, the most likely cause of slow `tstats` performance is that the data model lacks acceleration.

Exam trap

Splunk often tests the misconception that `tstats` always uses acceleration or that a `where` clause on a non-indexed field is the primary cause of slowness, when in fact the absence of acceleration is the most common and impactful reason for poor `tstats` performance.

How to eliminate wrong answers

Option A is wrong because a data model with many fields can slow down acceleration or search, but `tstats` queries the accelerated summary (TSIDX) which is optimized for many fields; the primary performance bottleneck is the lack of acceleration, not field count. Option C is wrong because a `where` clause on a non-indexed field would not affect `tstats` performance when querying an accelerated data model, as `tstats` operates on the TSIDX index where all fields are indexed; the slowness is due to the absence of acceleration, not the `where` clause. Option D is wrong because `tstats` can use either `from` (to reference a data model) or `index` (to reference a raw index), and using `from` is the correct syntax for querying a data model; the slowness is not caused by using `from` but by the data model not being accelerated.

Practice this question →

138

MCQmedium

A search includes `... | eval day=strftime(_time, "%A") | stats count by day | sort count`. The results show Monday has the highest count. The analyst wants to confirm that the timezone is correctly applied. Which command should be added before the eval to ensure the day calculation uses the local timezone?

A.`... | eval day=strptime(_time, "%A") | ...`

B.`... | fields + _time, day | ...`

C.`... | eval _time=_time + (your_tz_offset*3600) | eval day=strftime(_time, "%A") ...`

D.`... | convert ctime(_time) | eval day=strftime(_time, "%A") ...`

E.`... | eval _time=relative_time(_time, "-0@d") | eval day=strftime(_time, "%A") ...`

AnswerC

Correct: adjusting _time by timezone offset before extracting day.

Why this answer

Option C is correct because the `strftime` function uses the server's timezone by default, which may not match the local timezone. By manually adding the timezone offset (in seconds) to `_time` before the `eval`, you shift the epoch timestamp to reflect the local time, ensuring that `strftime` calculates the correct day of the week. This is a common workaround when the search head's timezone differs from the user's local timezone.

Exam trap

Splunk often tests the misconception that `strftime` automatically respects the user's local timezone, when in fact it uses the search head's timezone setting, requiring manual offset adjustment for accurate local-time calculations.

How to eliminate wrong answers

Option A is wrong because `strptime` is used to parse a string into an epoch timestamp, not to format a timestamp into a day name; using it here would cause an error or incorrect results. Option B is wrong because `fields + _time, day` only retains those fields and does not adjust the timezone; it does not affect how `strftime` interprets the timestamp. Option D is wrong because `convert ctime(_time)` converts the epoch timestamp to a human-readable string (ctime format), but does not change the underlying timezone applied by `strftime`; it would break the subsequent `strftime` call.

Option E is wrong because `relative_time(_time, "-0@d")` truncates the timestamp to the start of the current day (midnight) without any timezone offset, so it does not correct for timezone differences and may shift the day incorrectly.

Practice this question →

139

Multi-Selectmedium

Which THREE of the following are valid ways to create a subsearch in SPL? (Choose three.)

Select 3 answers

A.... | join type=inner [search index=other]

B.... | map search="search index=other $field$"

C.[return index=main | stats count]

D.[search index=main | stats count]

E.... | append [search index=other]

AnswersB, D, E

map runs a search for each result, effectively a subsearch.

Why this answer

Option B is correct because the `map` command in SPL allows you to run a subsearch for each result of the outer search, using field values from the outer result (e.g., `$field$`) to dynamically construct the inner search. This is a valid way to create a subsearch that iterates over search results, making it a legitimate subsearch pattern in Splunk.

Exam trap

Splunk often tests the distinction between commands that use subsearches (like `append`, `join`, `map`) versus commands that are not valid subsearch syntax (like `return`), and candidates may mistakenly think `return` is a valid subsearch command because it sounds similar to `search` or `output`.

Practice this question →

140

MCQmedium

A user runs a search that returns 1,000,000 results but only sees 5,000 in the Statistics tab. What is the most likely cause?

A.The results are being sampled

B.The stats command is being used without a by clause

C.The time range is too narrow

D.The search command truncates results

AnswerB

Without by, stats collapses all events into one row per function.

Why this answer

Option B is correct: the stats command without a 'by' clause aggregates all events into a single row (or by whatever field specified). If no 'by' clause, it returns one row per aggregation, so a small number of rows. Option A is wrong because the search command truncates at 50,000 results by default.

Option C is wrong because time range narrowness would reduce raw events, but here stats shows few rows. Option D is wrong because sampling is not a default behavior.

Practice this question →

141

MCQhard

A search returns events with fields 'user', 'action', and 'count'. The analyst wants to create a timechart showing the number of distinct users performing 'login' actions per hour. Which search is correct?

A.`... | stats dc(user) by _time span=1h`

B.`... | timechart span=1h dc(by user)`

C.`... | timechart span=1h dc(user)`

D.`... | eval user=user | timechart span=1h count by user`

E.`... | timechart span=1h sum(count) by user`

AnswerC

Correct: timechart with distinct count of user per hour.

Why this answer

Option C is correct because `timechart span=1h dc(user)` computes the distinct count of the 'user' field per 1-hour time bucket, which directly answers the requirement of showing the number of distinct users performing 'login' actions per hour. The `dc()` function in Splunk is the distinct count function, and `timechart` automatically groups events by `_time` into the specified span.

Exam trap

The trap here is that candidates often confuse `dc(user)` (distinct count of users) with `count by user` (count of events per user), leading them to pick option D or E, which answer a different question.

How to eliminate wrong answers

Option A is wrong because `stats dc(user) by _time span=1h` does not use `timechart`, so it will not produce a timechart visualization; it returns a table of distinct user counts per time bucket but lacks the timechart formatting and binning behavior. Option B is wrong because `dc(by user)` is invalid syntax; `dc()` takes a single field argument, not a `by` clause. Option D is wrong because `eval user=user` is redundant and `timechart span=1h count by user` computes the count of events per user, not the distinct count of users.

Option E is wrong because `sum(count) by user` sums the 'count' field per user, which gives total login counts per user, not the number of distinct users.

Practice this question →

142

MCQeasy

A data scientist wants to extract the domain from email addresses in the `_raw` field. The emails follow the pattern user@domain.tld. Which eval expression should be used to create a new field called `domain` containing only the domain part?

A.eval domain=mvindex(split(email,"@"),1)

B.eval domain=mvindex(split(email,"@"),0)

C.eval domain=replace(email,".*@(.*)","\1")

D.eval domain=substr(email, indexof(email,"@")+1)

AnswerA

Splits on '@' and takes the second part (index 1) which is the domain.

Why this answer

Option A is correct because `split(email,"@")` creates a multivalue field with two parts: the username (index 0) and the domain (index 1). `mvindex(...,1)` extracts the second element, which is the domain. This is the most direct and efficient way to isolate the domain from an email address in Splunk's eval expression.

Exam trap

The trap here is that candidates often confuse the zero-based index of `mvindex` (thinking index 1 is the username) or incorrectly assume `replace` with a regex is the most straightforward approach, when in fact `split` with `mvindex` is the simplest and most reliable method for this exact pattern.

How to eliminate wrong answers

Option B is wrong because `mvindex(...,0)` extracts the username (the part before `@`), not the domain. Option C is wrong because `replace(email,".*@(.*)","\1")` uses a regex that is greedy and may not correctly capture the domain in all cases (e.g., if the email contains multiple `@` symbols or special characters), and `replace` is not the idiomatic Splunk function for this extraction. Option D is wrong because `substr(email, indexof(email,"@")+1)` would extract everything after the `@`, including any trailing whitespace or newline characters, and does not handle cases where the `@` is missing (returns an empty string or error).

Practice this question →

143

Multi-Selectmedium

Which TWO of the following are valid ways to calculate the median of a numeric field?

Select 2 answers

A.eval median = percentile(field, 50)

B.eventstats median(field)

C.stats perc(field, 50)

D.stats p50(field)

E.stats median(field)

AnswersB, E

eventstats median adds the median value to each event.

Why this answer

Option B is correct because `eventstats median(field)` computes the median of the specified field and adds it as a new field to every event, which is a valid way to calculate the median. Option E is correct because `stats median(field)` directly computes the median of the numeric field and returns a single result, which is the standard method for median calculation in Splunk.

Exam trap

Splunk often tests the distinction between `eval` and `stats` functions, and candidates mistakenly use `eval` with aggregation functions like `percentile` or confuse the syntax for percentile commands (e.g., `perc`, `p50`) with the correct `perc50` or `percentile` syntax.

Practice this question →

144

MCQmedium

A security team runs a search to count login failures per user over the last 24 hours: `index=security action=failure | stats count by user`. The results show counts, but some users have extremely high counts due to a brute force attack. The team wants to identify users with a count greater than 100. What should they do to get the desired list?

A.Use `| top limit=100 user` to get the top 100 users.

B.Add `| where count > 100` after the stats command.

C.Add `| where count > 100` before the stats command.

D.Use `| filter count > 100` after the stats command.

AnswerB

Correctly filters the stats results by the count field.

Why this answer

Option B is correct because the `stats count by user` command creates a field called `count` that holds the number of login failures per user. Adding `| where count > 100` after the stats command filters the results to show only users whose count exceeds 100. The `where` command evaluates field values in the current results, making it the appropriate tool for this post-aggregation filter.

Exam trap

Splunk often tests the distinction between filtering before aggregation (using `search` or `where` on raw events) versus filtering after aggregation (using `where` on computed fields), and candidates mistakenly place the filter before `stats` or use a nonexistent command like `filter`.

How to eliminate wrong answers

Option A is wrong because `| top limit=100 user` returns the top 100 users by count, not users with a count greater than 100; it does not apply a threshold filter. Option C is wrong because placing `| where count > 100` before the stats command would attempt to filter on a field `count` that does not yet exist, causing an error or no results. Option D is wrong because `filter` is not a valid Splunk command; the correct command for filtering results is `where`, not `filter`.

Practice this question →

145

MCQmedium

A Splunk admin is responsible for a search dashboard that displays real-time statistics of application errors. The search uses 'index=app sourcetype=error | timechart count by severity span=5m'. Users report that the dashboard is slow and often times out. The environment has 4 indexers and the data volume is about 500 GB/day. The admin wants to improve performance without changing the dashboard's output. Which step should they take?

A.Replace timechart with 'bucket _time span=5m | stats count by _time, severity' and add streaming commands.

B.Create a summary index that runs every 5 minutes to pre-aggregate error counts by severity, and modify the dashboard to search the summary index.

C.Limit the time range to the last 1 hour instead of 24 hours.

D.Enable search acceleration for the index.

AnswerB

Reduces the amount of data scanned in real time.

Why this answer

Option B is correct because adding a summary index that precomputes the counts by severity and using that in the dashboard reduces the real-time data scan. Option A would not help as it only benefits ad-hoc searches. Option C reduces the number of events but also changes the output (fewer severities).

Option D uses streaming commands which may not reduce disk I/O significantly.

Practice this question →

146

MCQeasy

A search returns 1000 results per second. The user wants to see a trend of counts over the past hour in 5-minute intervals. Which command should be used?

A.timechart span=5min count

B.chart count over _time span=5min

C.stats count by _time span=5min

D.streamstats count span=5min

AnswerA

`timechart` with `span=5min` correctly creates a time series of event counts per 5-minute bucket.

Why this answer

The `timechart` command is designed to create a time-based chart with automatic binning of events into time buckets. By specifying `span=5min`, you explicitly set the bucket size to 5-minute intervals, and `count` calculates the number of events per bucket. This directly satisfies the requirement to see a trend of counts over the past hour in 5-minute intervals.

Exam trap

Splunk often tests the misconception that `stats` or `chart` can be used with a `span` parameter to create time-based buckets, when in fact only `timechart` (and `bucket` in conjunction with `stats`) supports this syntax for time aggregation.

How to eliminate wrong answers

Option B is wrong because `chart count over _time span=5min` is not valid syntax; `chart` does not support the `span` option and requires a `by` clause to split data, making it unable to produce time-based buckets. Option C is wrong because `stats count by _time span=5min` is invalid; `stats` does not accept a `span` keyword, and grouping by raw `_time` would create a separate count for each unique timestamp, not aggregated intervals. Option D is wrong because `streamstats count span=5min` is invalid; `streamstats` computes running or sliding window statistics and does not support a `span` parameter, nor does it bin events into time intervals.

Practice this question →

147

MCQmedium

The exhibit shows a search to find the top 5 URI-method combinations by count. However, the results show only 5 rows, but the analyst expected to see the top 5 URIs overall, not combinations. Which change to the search would achieve the desired result?

A.Add `| where method="GET"` before stats.

B.Replace `stats` with `chart count over uri by method`.

C.Use `top limit=5 uri, method` instead.

D.Add `| stats sum(count) as total by uri` after the existing stats.

E.Change `stats count by uri, method` to `stats count by uri`.

AnswerE

Correct: grouping only by uri gives count per URI.

Why this answer

Option E is correct because the original search uses `stats count by uri, method`, which groups results by both URI and method, producing separate counts for each combination. Changing it to `stats count by uri` removes the method field from the grouping, so the count is aggregated per URI alone, giving the top 5 URIs overall as the analyst expected.

Exam trap

Splunk often tests the distinction between grouping by multiple fields versus a single field, and the trap here is that candidates may think they need an additional stats command (Option D) or a filter (Option A) when simply removing the extra field from the `by` clause is the correct and efficient fix.

How to eliminate wrong answers

Option A is wrong because adding `| where method="GET"` would filter to only GET requests, which does not aggregate across all methods and still groups by URI and method if the stats clause remains unchanged. Option B is wrong because `chart count over uri by method` creates a tabular breakdown of counts per method for each URI, not a single count per URI, and still separates by method. Option C is wrong because `top limit=5 uri, method` returns the top 5 URI-method combinations by count, which is exactly what the original search does, not the top 5 URIs overall.

Option D is wrong because adding `| stats sum(count) as total by uri` after the existing stats would sum the counts for each URI, but the preceding stats already produced separate rows per combination; this would work only if the first stats output is properly structured, but it is an unnecessary extra step when simply removing `method` from the first stats is cleaner and more direct.

Practice this question →

148

MCQmedium

A user wants to create a report that shows the top 5 sources of errors, excluding a specific source 'host1'. Which SPL is correct?

A.index=main sourcetype=access_combined status>400 NOT host="host1" | top limit=5 source

B.index=main sourcetype=access_combined status>400 | top limit=5 source | where source!="host1"

C.index=main sourcetype=access_combined status>400 | top limit=5 source | search source!="host1"

D.index=main sourcetype=access_combined status>400 | search NOT host=host1 | top limit=5 source

AnswerA

Correctly excludes host1 before top, ensuring accurate top 5.

Why this answer

Option A is correct because it filters out 'host1' before the `top` command runs, ensuring that the top 5 sources of errors are calculated from the remaining data. The `NOT host="host1"` clause is placed in the base search, which is the most efficient approach and guarantees that 'host1' is excluded from the statistical aggregation.

Exam trap

Splunk often tests the misconception that filtering after a transforming command like `top` is equivalent to filtering before it, when in reality the aggregation is performed on the entire dataset first, altering the results.

How to eliminate wrong answers

Option B is wrong because the `where` command is applied after `top`, which means the top 5 sources are computed including 'host1', and then 'host1' is removed from the result set; this could leave fewer than 5 results and does not exclude 'host1' from the ranking calculation. Option C is wrong because the `search` command after `top` also filters after the aggregation, suffering from the same issue as Option B, and additionally `search source!="host1"` incorrectly uses the field `source` instead of `host` to filter the host. Option D is wrong because the `search NOT host=host1` is placed after the base search but before `top`, which would work logically, but the syntax is incorrect: `search NOT host=host1` is not valid SPL (the correct syntax is `NOT host="host1"` or `host!="host1"`), and the command is redundant since the base search already has the same filter; however, the primary flaw is that the `search` command is unnecessary and the syntax error makes it invalid.

Practice this question →

149

Multi-Selectmedium

Which THREE of the following are benefits of using eventstats over stats when analyzing event logs? (Choose three.)

Select 3 answers

A.The original number of events is preserved.

B.It uses less memory than stats.

C.You can use the aggregated field in subsequent commands like where or eval.

D.It is always faster than stats.

E.It allows you to see individual event details alongside aggregate statistics.

AnswersA, C, E

eventstats does not reduce event count.

Why this answer

Option A is correct because `eventstats` adds aggregate statistics (like sums or averages) to each original event without reducing the total number of events. Unlike `stats`, which collapses events into a single summary row per group, `eventstats` appends the aggregated value to every matching event, preserving the original event count and structure.

Exam trap

The trap here is that candidates confuse `eventstats` with `stats`, assuming `eventstats` is always faster or more memory-efficient, when in fact it trades off performance and memory for the ability to retain original event context.

Practice this question →

150

MCQmedium

To find users who logged in from more than 3 different IP addresses, which search is correct?

A.index=auth | stats dc(IP) by user | where dc(IP) > 3

B.index=auth | top limit=3 IP by user

C.index=auth | eval user, IP | dedup user, IP | stats count by user | where count > 3

D.index=auth | stats distinct_count(IP) by user | where distinct_count(IP) > 3

AnswerA

dc counts distinct IPs per user, then filters.

Why this answer

Option A is correct because it uses `stats dc(IP) by user` to count distinct IP addresses per user, then filters with `where dc(IP) > 3` to return only users who logged in from more than 3 different IPs. The `dc()` function calculates distinct count, which is exactly what the question requires.

Exam trap

Splunk often tests the distinction between `dc()` (distinct count) and `count` (total occurrences), and the trap here is that candidates may confuse `distinct_count()` (invalid) with `dc()` or think `dedup` followed by `count` achieves the same result, which it does not because it counts duplicates of the pair rather than distinct IPs per user.

How to eliminate wrong answers

Option B is wrong because `top limit=3 IP by user` returns the top 3 IP addresses per user, not a count of distinct IPs, and cannot filter for users with more than 3 distinct IPs. Option C is wrong because `eval user, IP` is invalid syntax (eval requires an expression), and `dedup user, IP` removes duplicate pairs but does not count distinct IPs per user correctly; the subsequent `stats count` counts occurrences, not distinct IPs. Option D is wrong because `distinct_count(IP)` is not a valid SPL function; the correct function is `dc(IP)`, and this search would produce an error.

Practice this question →