Arrange the steps to configure role-based access control in Splunk.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Why this order
Roles are configured by setting capabilities and resource access restrictions.
75 of 150 questions · Page 2/2 · Advanced Searching and Statistics · Answers revealed
Arrange the steps to configure role-based access control in Splunk.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Why this order
Roles are configured by setting capabilities and resource access restrictions.
A security analyst wants to find IP addresses that have been involved in both login failures and successful logins within a 5-minute window. Which approach is most efficient?
Groups events by IP within a time span, ideal for this scenario.
Why this answer
Option B is correct because the transaction command groups events from the same IP within a time window, ideal for correlating failure and success. Option A is wrong because subsearches are resource-intensive and not ideal for this correlation. Option C is wrong because stats with values does not guarantee temporal proximity.
Option D is wrong because appendcols requires exact field matching and does not handle time windows.
Refer to the exhibit. This search is intended to find users with average duration above overall average. However, it returns no results. Why?
Stats output does not include fields from prior commands unless preserved.
Why this answer
Option B is correct: eventstats adds overall_avg to each event, but stats by user only outputs user and user_avg, dropping overall_avg, so where compares a non-existent field to user_avg. Option A is wrong because eventstats before stats is correct conceptually. Option C is wrong because where works fine with fields present.
Option D is wrong as subquery is not needed.
Refer to the exhibit. This search returns an error. What is the most likely cause?
Stats does not preserve _time unless explicitly used in a by clause.
Why this answer
The error occurs because the `stats` command removes the `_time` field from the events, and `timechart` requires a valid `_time` field to create time-based buckets. Without `_time`, `timechart` cannot generate the time axis, resulting in a search error. This is a common pitfall when chaining `stats` before `timechart` without preserving the time field.
Exam trap
Splunk often tests the misconception that `stats` preserves `_time` by default, leading candidates to overlook that `timechart` requires an explicit time field in the result set.
How to eliminate wrong answers
Option B is wrong because `eval` can be used before `timechart` without issue, as long as the required `_time` field is present. Option C is wrong because `status_group` is created by `eval` and is available to `timechart`; the error is not about field availability but the missing `_time` field. Option D is wrong because `timechart` can use aggregated count fields from `stats`; the real problem is that `stats` removes `_time`, not that the count field is incompatible.
An administrator wants to correlate events from the same session but the events span up to 30 minutes apart. The transaction command is being considered. Which transaction option is most appropriate to ensure sessions are correctly grouped without artificially high memory usage?
Correctly defines time window and pause to group sessions
Why this answer
Option B is correct because the `maxspan=30m` ensures events spanning up to 30 minutes are grouped into the same transaction, while `maxpause=5m` prevents the transaction from remaining open indefinitely by closing it after 5 minutes of inactivity. This combination correctly groups sessions without keeping the transaction open for the full 30 minutes, which would artificially increase memory usage by holding events in the buffer.
Exam trap
Splunk often tests the misconception that `maxspan` alone is sufficient to control memory usage, when in fact `maxpause` is critical to close transactions during idle periods and prevent excessive memory consumption.
How to eliminate wrong answers
Option A is wrong because using only `maxspan=30m` without `maxpause` means the transaction will remain open for the entire 30-minute span even if there are long gaps between events, causing high memory usage as events are held in the buffer. Option C is wrong because `maxevents=100` limits the number of events per transaction but does not address the time span or pause requirements, so sessions spanning 30 minutes may be split or incomplete. Option D is wrong because `keepevicted=true` retains evicted (incomplete) transactions in the output, which does not help control memory usage and may actually increase it by including partial groups.
A security analyst is investigating a potential breach. They have a search that uses the transaction command to group events by session_id and calculates the total bytes transferred per session. However, the search takes over 30 minutes to complete on a 24-hour time range. The environment has 10 indexers with default settings. The analyst needs to reduce search time while preserving the ability to group by session_id. Which course of action should they take?
Reduces the number of events per session, making transaction faster.
Why this answer
Option C is correct because summarizing events by session_id using stats with values and sum before the transaction command reduces the number of events that transaction needs to process. Option A would disable parallel processing, making it slower. Option B adds subsearch overhead.
Option D changes the grouping logic and does not reduce the workload.
A Splunk administrator runs the following search and notices that the results include events where the 'status' field is 200 or 404, but also includes events where the 'status' field is missing. What is the most efficient way to modify the search to exclude events where the 'status' field does not exist?
ISNULL(status) returns true if field does not exist; NOT ISNULL ensures only events with a status field are considered.
Why this answer
Option B is correct because it uses the `NOT ISNULL(status)` filter before the OR conditions, which efficiently excludes events where the `status` field does not exist. In Splunk, `ISNULL()` returns true if a field is missing or null, so `NOT ISNULL(status)` ensures only events with a defined `status` field are considered, and then the parentheses group the OR conditions correctly. This approach is more efficient than post-filtering because it reduces the result set early in the search pipeline.
Exam trap
The trap here is that candidates often confuse `ISNULL()` with checking for empty strings or use `!=null` as if it were SQL, failing to recognize that Splunk requires explicit `ISNULL()` or `isnull()` functions for field existence checks.
How to eliminate wrong answers
Option A is wrong because `status!=null` is not a valid Splunk syntax for checking field existence; it compares the field value to the literal string 'null' rather than checking for absence. Option C is wrong because while `where isnotnull(status)` works, it is less efficient than using `NOT ISNULL(status)` in the base search, as `where` processes all results after the initial search, whereas the base search filter can leverage index-time optimizations. Option D is wrong because it simply searches for status=200 OR status=404 without any filter to exclude events where the status field is missing, so it will still include those events.
A user needs to identify the top 3 error types by count, but only for the current month, and exclude results with fewer than 100 occurrences. Which TWO steps are necessary? (Select two.)
Excludes error types with count less than 100.
Why this answer
Option B is correct because the `where` command in Splunk is used to filter results based on a condition, and here it is needed to exclude error types with fewer than 100 occurrences after counting. Option D is correct because the `top` command with `limit=3` returns the top 3 values of a field by count, which directly satisfies the requirement to identify the top 3 error types.
Exam trap
Splunk often tests the distinction between using the time range picker versus explicit time commands in the search, and candidates may incorrectly assume that the time range picker is a necessary step when the search itself can use relative time modifiers like `earliest=-30d@d`.
A user runs a search on web access logs: `index=web | eventstats sum(bytes) as total_bytes by host`. The search returns the correct total bytes per host, but now the user needs to calculate the average bytes per host for each event. Which command should be added to the base search to achieve this?
eventstats can compute average directly and add it to each event.
Why this answer
eventstats can compute the average directly with `avg(bytes)`. Option A requires manually calculating average with count, which is more complex. Option C uses streamstats, which computes a running average, not overall.
Option D uses stats and join, which is slower and may not work well.
Which TWO of the following are valid uses of the stats command in Splunk? (Choose two.)
Valid: returns list of distinct IPs per user.
Why this answer
The `stats` command in Splunk can compute aggregate statistics over fields. `values(ip) by user` is valid because `values()` returns a multivalue list of all distinct `ip` values for each `user`, which is a standard aggregation function. `count by host` is valid because `count` is a default aggregation that counts events per `host`.
Exam trap
The trap here is that candidates may confuse valid `stats` functions with functions from other contexts (like `mode()` from statistics or `first()` from programming languages) or assume that `median()` is supported when Splunk uses percentile functions instead.
Refer to the exhibit. What is the result of this search?
This accurately describes the output of the search.
Why this answer
The search uses the `top` command, which by default returns the most common values of a field sorted by count in descending order, limited to 10 results. The `limit=5` parameter overrides the default to return only the top 5 users. The `countfield` option renames the count column to 'total', and the `showcount=f` hides the percent column, producing a table of users and their total counts sorted by count descending, limited to 5 rows.
Exam trap
Splunk often tests the default behavior of the `top` command—specifically that it sorts by count descending and limits results to 10—and candidates mistakenly think it returns all values or sorts alphabetically, or they overlook the `limit=5` override.
How to eliminate wrong answers
Option A is wrong because the `top` command sorts by count descending, not ascending, and it does not return all users—it limits results to the top 5. Option B is wrong because the search does not filter for 'failed password' events; it operates on all events in the index and uses the `top` command to find the most common users, not the first 5 events. Option D is wrong because the `top` command sorts by count, not alphabetically by username, and it returns the most frequent users, not a simple alphabetical list.
A search uses `transaction maxspan=30s maxpause=5s`. Events are sorted by _time. If there is a gap of 10 seconds between two events, what happens?
A gap of 10s exceeds the 5s maxpause, so a new transaction begins.
Why this answer
The `maxpause` parameter in the `transaction` command defines the maximum allowed gap between consecutive events within the same transaction. Since the gap of 10 seconds exceeds the `maxpause=5s`, the events are split into separate transactions, regardless of the `maxspan=30s` limit. The `maxspan` only sets an upper bound on the total duration of the transaction from the first to the last event, but it does not override the pause-based splitting logic.
Exam trap
The trap here is that candidates often confuse `maxpause` with `maxspan`, mistakenly thinking that as long as the total duration is under `maxspan`, any gap is acceptable, when in fact `maxpause` enforces a strict per-gap limit that can split transactions independently.
How to eliminate wrong answers
Option A is wrong because it incorrectly assumes that a gap within `maxspan` overrides `maxpause`; in reality, `maxpause` is evaluated first and any gap exceeding it forces a split. Option B is wrong because it ignores the `maxpause` constraint entirely, suggesting that only the total span matters, which is false. Option C is wrong because it claims splitting only occurs when total span exceeds `maxspan`, but the `maxpause` parameter independently triggers splits on inter-event gaps.
Refer to the exhibit. The search above returns no results for api_version. What is the most likely cause?
If `uri_path` is not a field in the sourcetype, the rex will not extract anything.
Why this answer
The `rex` command extracts fields based on a regex pattern applied to a specific source field. If `uri_path` does not exist in the events or its values do not match the pattern `(?<api_version>/v[0-9]+)`, then no `api_version` field will be created. This is the most likely cause because the search returns no results for `api_version`, indicating the extraction failed at the source field level.
Exam trap
Splunk often tests the misconception that a regex pattern is incorrect when the real issue is that the source field is missing or contains non-matching data, leading candidates to focus on syntax rather than data validation.
How to eliminate wrong answers
Option A is wrong because `stats` can be used after `rex` without issue; `rex` extracts fields, and `stats` can then aggregate them. Option C is wrong because if the time range were too short, the search would return no events at all, not just no results for `api_version` while other fields might exist. Option D is wrong because the regex pattern `(?<api_version>/v[0-9]+)` is syntactically correct for capturing a version string like `/v1` or `/v2`; the issue is that it is applied to a field that may not contain matching data.
You need to find the percentage of total events contributed by each sourcetype. Which command should follow index=* | stats count by sourcetype?
eventstats adds total column, then eval computes percentage per row.
Why this answer
Option A is correct because eventstats adds a total count field across all events, then eval computes the percentage. Option B addtotals adds row totals, not a column total. Option C attempts to use sum in eval, which is invalid.
Option D appendpipe adds a row with total, not a column, making the eval compute incorrectly.
Which TWO of the following eval functions can be used to convert a string to a numeric value?
`int()` converts a value to an integer, working on strings as well.
Why this answer
The `int()` function (option C) converts a string representation of an integer into a numeric integer value, and `tonumber()` (option E) converts a string to a floating-point or integer number, making both valid for converting strings to numeric values in Splunk's eval command.
Exam trap
Splunk often tests candidates' familiarity with Splunk's specific eval function names, and the trap here is that `number()` and `str()` sound plausible but are not valid Splunk functions, leading candidates to select them based on general programming knowledge rather than Splunk's actual syntax.
A security analyst wants to find IP addresses that have attempted to access a specific URL more than 5 times in the last hour and also have a user agent string containing "curl". They need to use a subsearch to pre-filter IPs. Which search is correct?
Correctly uses subsearch to filter IPs, then counts and filters.
Why this answer
Option B is correct because it uses a subsearch to first find IPs that have accessed the URL more than 5 times with a user agent containing 'curl', then passes those IPs to the outer search to filter the original data. The subsearch returns a list of src_ip values, which the outer search uses as a filter, ensuring only IPs meeting both conditions are counted again. This matches the requirement to pre-filter IPs using a subsearch.
Exam trap
The trap here is that candidates often confuse a subsearch with a simple filter or stats command, leading them to choose options that either omit the subsearch syntax or place it incorrectly, such as at the start without proper piping.
How to eliminate wrong answers
Option A is wrong because the subsearch is placed at the beginning without a leading pipe, making it a standalone search that does not feed into the outer search; it also lacks the outer search's index and sourcetype, so it returns no results. Option C is wrong because it does not use a subsearch at all; it simply filters and counts in a single search, which does not pre-filter IPs as required. Option D is wrong because it uses parentheses incorrectly and does not include a subsearch; it performs a single-pass filter and count, failing to pre-filter IPs.
A security analyst wants to calculate the average latency for each web server over the past hour, but only for requests where the status code is 200. The search result includes fields: server, latency, status. Which search correctly accomplishes this?
Correctly filters only status=200 events before statistical aggregation.
Why this answer
Option D is correct because it filters events to only those with status=200 before the stats command, ensuring the average latency is calculated exclusively over successful requests. The stats command then computes the average latency grouped by server, which directly answers the requirement without needing conditional logic or post-filtering.
Exam trap
The trap here is that candidates often think they can filter after stats using where, but stats collapses events into summary statistics, so a subsequent where cannot filter the original events used in the aggregation.
How to eliminate wrong answers
Option A is wrong because it uses eval to set good_latency to null for non-200 statuses, but stats avg() ignores null values, so it effectively averages only over status=200 events; however, this is less efficient and less idiomatic than filtering first, and the question asks for the 'correct' search, where D is the standard best practice. Option B is wrong because eventstats calculates the average latency across all events (including non-200) and adds it to each event, then filters to status=200; this gives the overall average latency for all requests, not the average per server for only status=200 requests. Option C is wrong because it applies the where status=200 filter after the stats command, which has already aggregated data across all status codes, so the filter has no effect on the computed averages.
A user wants to calculate the average response time per user, but only for users who have more than 10 events. Which search approach is efficient?
Computes both statistics and filters correctly.
Why this answer
Option B is correct because it first uses `stats` to compute both the average response time and the event count per user, then filters with `where cnt>10` to keep only users who have more than 10 events. This ensures the average is calculated only after grouping, and the count condition is applied on the aggregated result, which is efficient and accurate.
Exam trap
The trap here is that candidates often confuse `eventstats` with `stats` and think they can filter on an aggregated field like `count` without first computing it in the same `stats` command, leading them to choose Option A or D.
How to eliminate wrong answers
Option A is wrong because `eventstats` adds the average and count to each raw event without reducing the dataset, and then `where count>10` filters events rather than users, so it does not correctly isolate users with more than 10 events. Option C is wrong because `where count>10` is applied before any aggregation, but `count` is not a field in raw events, so this will return no results or an error. Option D is wrong because `stats avg(response_time) by user` computes only the average per user, discarding the count, so `where count>10` cannot reference the count field, causing the search to fail or produce incorrect results.
A security analyst runs `index=network sourcetype=firewall | stats count by src_ip | sort - count | head 10` to find the top 10 source IPs by event count. The search returns only 5 results. Which of the following is the most likely reason?
If the number of distinct src_ip values is less than 10, head 10 returns all of them, resulting in fewer than 10 rows.
Why this answer
Option D is correct because the `stats count by src_ip` command groups events by each unique source IP address and counts them. If the search returns only 5 results, it means there are only 5 unique source IPs in the dataset matching the time range and filters. The `head 10` command then limits output to 10 rows, but since only 5 groups exist, only 5 rows are returned.
Exam trap
The trap here is that candidates assume `head 10` always returns 10 results, forgetting that `head` limits the number of output rows from the preceding command, which may already have fewer rows than the limit.
How to eliminate wrong answers
Option A is wrong because a short time range would reduce the total event count, but the `stats count by src_ip` command still groups by unique IPs; if there are more than 10 unique IPs, the search would return 10 results regardless of total event count. Option B is wrong because `sort - count` with a space is valid syntax in SPL; the space between the dash and the field name is optional and does not cause the command to fail. Option C is wrong because the `stats count by src_ip` command already includes `count` as the aggregation function and `src_ip` as the grouping field; there is no requirement to list `count` in the `by` clause.
An analyst wants to identify the top 5 user agents that generated the most 404 errors in the last 24 hours. Which search accomplishes this correctly and efficiently?
Correctly filters for 404 errors and efficiently returns top 5 user agents using the top command.
Why this answer
Option A is correct because it first filters events to only those with status=404, then uses the `top` command with `limit=5` to efficiently count and rank user_agent values. This ensures the search only processes relevant events, minimizing resource usage and returning the correct top 5 user agents for 404 errors.
Exam trap
Splunk often tests the order of operations in Splunk SPL, specifically that filtering commands like `status=404` must precede statistical commands like `top` or `stats` to ensure the aggregation is performed only on the subset of interest, not on the entire dataset.
How to eliminate wrong answers
Option B is wrong because the `top` command processes fields in the order they are listed; placing `user_agent` before `status=404` means it will count user_agent values across all events, then apply the status=404 filter as a secondary field, which does not restrict the count to only 404 errors. Option C is wrong because it omits the status=404 filter entirely, returning the top 5 user agents across all HTTP status codes, not just 404 errors. Option D is wrong because the `where status=404` clause is placed after the `stats count by user_agent` command, which already aggregated data without the status filter; at that point, the `status` field is no longer available in the results, causing the search to fail or return no results.
Match each Splunk search operator to its behavior.
Drag a concept onto its matching description — or click a concept then click the description.
Pipes output of one command to the next
Excludes events that match the following term
Matches events that contain either term
Matches events that contain both terms (default)
Groups terms to control evaluation order
Why these pairings
Operators control how search terms are combined and piped.
A security analyst needs to find the top 10 users with the most failed login attempts from the linux_secure sourcetype. Which SPL command is most efficient for this task?
The `top` command is optimized for finding top values and is efficient for this scenario.
Why this answer
Option A is correct because the `top` command in SPL is specifically designed to return the most frequent values of a field, and the `limit=10` parameter directly restricts the output to the top 10 results. This approach is more efficient than using `stats count` followed by `sort` and `head` because `top` performs the aggregation and ranking in a single operation, reducing processing overhead. The search also correctly filters for 'Failed password' events within the `linux_secure` sourcetype, ensuring only failed login attempts are considered.
Exam trap
Splunk often tests the misconception that `stats count by user | sort -count | head 10` is functionally equivalent to `top limit=10 user`, but the trap is that `top` is more efficient and is the idiomatic Splunk command for this task, while the multi-command approach is less optimal and may be penalized in performance-sensitive scenarios.
How to eliminate wrong answers
Option B is wrong because `sort 10 -count` is invalid syntax; the `sort` command requires the field name and direction (e.g., `sort -count`), and the limit must be applied via `head` or the `limit` parameter in `top`. Option C is wrong because while it produces the correct result, it is less efficient than option A; it requires two separate commands (`stats` then `sort` then `head`) instead of the single `top` command, and the `head 10` is redundant if `top limit=10` is used. Option D is wrong because it uses `regex _raw="Failed password"` instead of a simple search term, which is less efficient; Splunk's indexed search for a literal string is faster than applying a regex to the raw event data, and the `top limit=10` at the end is redundant since `stats count by user` already aggregated the data, making the `top` command unnecessary.
A Splunk admin wants to track the number of unique users who accessed a system each hour over the past 24 hours. Which search provides the correct result?
dc(user) gives distinct count of users per hour with timechart.
Why this answer
Option A is correct because it uses `timechart span=1h dc(user)` to count distinct users per hour over the last 24 hours. The `dc()` function calculates distinct counts, and `span=1h` sets the time bucket to one hour, exactly matching the requirement.
Exam trap
The trap here is confusing `count` (total events) with `dc()` (distinct values), and assuming `values()` or `count by user` can produce a unique user count per time period.
How to eliminate wrong answers
Option B is wrong because `values(user)` returns a multivalue list of users per hour, not a count of unique users. Option C is wrong because `stats dc(user) by _time` groups by raw event timestamps, not hourly buckets, and then `timechart` cannot properly aggregate pre-grouped data, leading to incorrect results. Option D is wrong because `count by user` counts events per user per hour, not the number of unique users; it produces a separate series for each user rather than a single count of distinct users.
Which THREE of the following are valid uses of the stats command? (Select three.)
Stats avg() computes average
Why this answer
The `stats` command in Splunk is used to perform statistical aggregations on search results. Option A is correct because `stats avg(field)` calculates the arithmetic mean of a specified field across all events in the result set. Option B is correct because `stats earliest(_time) by category` returns the minimum timestamp for each distinct value of the category field, which is a standard use of the `earliest()` function.
Option C is correct because `stats count by category` groups events by the categorical field and returns the number of events in each group, a fundamental aggregation pattern.
Exam trap
Splunk often tests the distinction between `stats` and `timechart`; the trap here is that candidates see 'time-based chart' and incorrectly assume `stats` can produce it, but `timechart` is the only command that automatically bins events into time buckets and supports multiple series via the `by` clause.
Which TWO of the following statements about the `transaction` command are true? (Choose two.)
Transaction can group by shared field values.
Why this answer
Option C is correct because the `transaction` command is designed to group events that share a common field value, such as a session ID, allowing you to correlate related events into a single transaction. This is a core use case for tracking user sessions or multi-step processes where events are linked by a shared identifier.
Exam trap
Splunk often tests the misconception that the `transaction` command uses a sliding time window, but in reality it uses a fixed or pause-based window, and candidates confuse this with the sliding window behavior of commands like `streamstats` or `timechart`.
A search uses a subsearch to retrieve a list of user IDs, and then the main search uses IN operator to filter events. The subsearch is expected to return up to 10,000 values. What is a potential limitation and how can it be addressed?
Fields values collapses duplicates and can exceed row limit
Why this answer
Option C is correct because the default limit for results returned by a subsearch in Splunk is 10,000. When using the `IN` operator in the main search, the subsearch must provide all necessary values; if more than 10,000 values are expected, the `| fields values` command can be used in the subsearch to override this limit and return all distinct values, as it bypasses the default result count restriction.
Exam trap
The trap here is that candidates often confuse the default subsearch result limit (10,000) with the main search result limit (50,000) or assume that increasing the limit with `| head` is the correct solution, when in fact the `| fields values` command is the proper method to return all values from a subsearch without hitting the row limit.
How to eliminate wrong answers
Option A is wrong because the default subsearch limit is 10,000, not 10,000 results by default that can be increased with `| head 50000`; using `| head` would only limit results further, not expand them, and the correct approach is to use `| fields values` to return all values. Option B is wrong because the default subsearch limit is 10,000, not 50,000; stating no change is needed is incorrect when the subsearch is expected to return up to 10,000 values, as this is exactly the default limit and may still be insufficient if the subsearch returns exactly 10,000 values (the limit is applied before the subsearch completes). Option D is wrong because the default subsearch limit is 10,000, not 100,000; no change is needed is also incorrect for the same reason as option B.
A large e-commerce company uses Splunk to monitor their web application. They have a query that uses the transaction command to group related events into transactions based on session ID and a 30-minute max pause. The query runs slowly and often times out. The environment has 10 indexers with 4 CPU cores each. The search is run over the last 7 days. Which of the following is the best course of action to improve performance?
Using stats and streamstats is more efficient than transaction and can achieve similar grouping results.
Why this answer
The `transaction` command is resource-intensive because it groups events by a field (session ID) and a max pause, requiring significant memory and processing to correlate events across the entire search time range. Replacing it with `stats` and `streamstats` is more efficient because `stats` can aggregate events by session ID without the overhead of transaction boundaries, and `streamstats` can compute running totals or windows within each session, leveraging distributed processing across indexers. This approach reduces memory pressure and avoids the timeout issue by using streaming operations that scale better with large datasets.
Exam trap
Splunk often tests the misconception that reducing the max pause or adding hardware (more indexers) is the best fix, when the real issue is replacing the inefficient `transaction` command with more scalable streaming commands like `stats` and `streamstats`.
How to eliminate wrong answers
Option A is wrong because using `eval` to create a transaction ID field and then `stats` to group events does not inherently improve performance; it still requires a similar grouping operation and does not address the core inefficiency of the `transaction` command's memory overhead. Option B is wrong because reducing the max pause to 15 minutes may limit transaction size but does not fundamentally reduce the computational cost of the `transaction` command, which still must evaluate event boundaries and maintain state for each session across the entire search window. Option D is wrong because increasing the number of indexers to 20 distributes the search load but does not optimize the query itself; the `transaction` command's performance bottleneck is often in the search head's memory and processing, not just indexing capacity, and adding indexers may not resolve timeouts if the command is inherently inefficient.
A security analyst needs to find all events where the field `status` has a value of either "error" or "critical" and the field `bytes` is greater than 1000. Which search correctly accomplishes this?
Parentheses ensure the OR is evaluated first, and then the AND with bytes>1000.
Why this answer
Option A is correct because in Splunk's Search Processing Language (SPL), parentheses group the OR conditions to ensure they are evaluated together, and the space between the grouped condition and `bytes>1000` acts as an implicit AND. This correctly retrieves events where `status` is either "error" or "critical" AND `bytes` is greater than 1000.
Exam trap
The trap here is that Splunk's implicit AND (space) combined with operator precedence causes candidates to forget that OR conditions must be grouped with parentheses to avoid unintended logic, leading them to choose Option B or D.
How to eliminate wrong answers
Option B is wrong because without parentheses, AND has higher precedence than OR, so it is parsed as `status=error OR (status=critical AND bytes>1000)`, which returns events with status=error regardless of bytes, plus events matching the AND condition. Option C is wrong because the `IN` operator in Splunk requires the field name to be on the left and a parenthesized list of values, but the syntax `status IN (error, critical)` is invalid; the correct syntax is `status IN ("error", "critical")` with quoted strings. Option D is wrong because it omits parentheses around the OR conditions, causing the implicit AND to bind more tightly to the second condition, resulting in the same precedence issue as Option B.
Refer to the exhibit. What does the final result represent?
Correct: per hour, per user comparison to hour average
Why this answer
The `eventstats` command calculates a per-hour average logon count across all users. The `where` clause then filters for events where a specific user's logon count for that hour is more than double that hourly average. This directly matches option C: hours where any user's logon count exceeds twice the average for that hour.
Exam trap
The trap here is that candidates confuse `eventstats ... by hour` (which computes a global average per hour) with a per-user average, leading them to incorrectly select option D or A.
How to eliminate wrong answers
Option A is wrong because the query does not compute a per-user average across all hours; it compares each user's hourly count to the hourly average, not a user's average. Option B is wrong because the comparison is against the average logon count for that specific hour, not the total logon count for the hour; the `where` clause checks `logon_count > 2 * avg_logons`, which is a per-user value, not a total. Option D is wrong because the average used is the hourly average across all users, not the user's own personal average; `eventstats` with `by hour` computes a global average per hour, not per user.
The search returns zero results, but the lookup file contains users with names like 'admin1', 'admin2'. What is the most likely reason?
like() is case-sensitive; also if user has spaces, pattern may not match.
Why this answer
The 'like' function in Splunk uses SQL-style pattern matching where '%' matches any sequence of characters. If the lookup file contains 'admin1' and 'admin2', but the search uses 'like(role, "admin%")', leading/trailing spaces in the field values or case sensitivity (e.g., 'Admin1' vs 'admin1') would cause the pattern to fail, returning zero results. Option B correctly identifies this as the most likely reason because Splunk's 'like' is case-sensitive by default and does not trim spaces.
Exam trap
Splunk often tests the misconception that 'like' is case-insensitive or automatically handles spaces, leading candidates to overlook the need for explicit trimming or case normalization.
How to eliminate wrong answers
Option A is wrong because Splunk lookups can be in CSV format or other formats like KV store; a non-CSV format would cause a different error (e.g., 'Error opening lookup file'), not silently return zero results. Option C is wrong because the stats command counts events based on the filtered results; if the role field is already filtered to only admin values, stats would still count those events, not return zero. Option D is wrong because the search command runs before the eval command in the pipeline order, but that does not cause zero results; the eval command would still process the filtered events correctly.
A search `index=main | eval weekday=strftime(_time,"%A") | stats count by weekday | sort - count` shows that Monday has the highest count. However, the user suspects that Monday data is double-counted due to timezone offset. What should be done to investigate?
`date_wday` is automatically generated by Splunk based on the configured timezone in the source type.
Why this answer
Option A is correct because `_time` is in UTC; if the events are from timezones where Monday starts earlier or later, using `date_wday` from the local time conversion is more accurate. Option B is wrong because `date_hour` is not needed. Option C is wrong because using `strftime` with timezone is possible but not the most direct.
Option D is wrong because converting to epoch does not help.
Which THREE of the following are valid Splunk search commands?
`regex` is a valid command to filter events using a regular expression.
Why this answer
The `regex` command is a valid Splunk search command that filters search results by applying a Perl-compatible regular expression (PCRE) to raw events or specific fields. It is commonly used to extract or match patterns within event data, such as IP addresses or error codes, and is distinct from the `rex` command which extracts fields.
Exam trap
Splunk often tests the distinction between real Splunk commands and plausible-sounding but non-existent commands like `filter` or `parse`, which candidates might confuse with similar functions in other tools or programming languages.
A security analyst needs to find all events where the field 'status' is either 'error' or 'critical', and then count the number of events per source IP. Which search is correct?
Correct syntax: parentheses group OR conditions, then stats count.
Why this answer
Option A is correct because it uses the proper syntax to filter events where the 'status' field is either 'error' or 'critical' within the index, and then pipes the results into the stats command to count events by 'src_ip'. The parentheses around the OR condition ensure correct evaluation order, and the stats count by src_ip accurately aggregates the count per source IP.
Exam trap
Splunk often tests the importance of parentheses in OR conditions within Splunk searches, as candidates commonly assume that 'status=error OR status=critical' without parentheses works the same as with parentheses, but it can lead to unintended search behavior due to operator precedence.
How to eliminate wrong answers
Option B is wrong because it uses 'AND' between the two status conditions, which would require an event to have both 'error' AND 'critical' simultaneously in the same field, which is impossible and returns zero results. Option C is wrong because it uses the 'where' command after the initial index filter, which is less efficient and not necessary; the 'where' command is typically used for more complex expressions, but here the OR condition can be handled directly in the search string. Option D is wrong because it lacks parentheses around the OR condition, which can lead to incorrect evaluation order; without parentheses, the search might be interpreted as 'index=security status=error' OR 'status=critical', potentially returning events from other indexes if 'status=critical' matches elsewhere.
Which SPL command can be used to create a new field based on a conditional evaluation, such as setting a status field to 'critical' if a numeric threshold is exceeded?
Eval with if performs conditional assignment
Why this answer
The `eval` command in SPL is used to create new fields or modify existing ones by evaluating expressions. The `if()` function within `eval` allows conditional logic, making `| eval status=if(value>100,"critical","normal")` the correct syntax to create a new field 'status' that is set to 'critical' when the numeric field 'value' exceeds 100, and 'normal' otherwise.
Exam trap
Splunk often tests the distinction between `eval` (for field creation and computation) and `convert` (for data type conversion), leading candidates to mistakenly choose `convert` for conditional logic due to its similar syntax.
How to eliminate wrong answers
Option A is wrong because `makemv` is used to split a single multivalue field into separate values, not to create a field based on conditional evaluation. Option B is wrong because `rex field=_raw` is used for extracting fields using regular expressions from the `_raw` event data, not for conditional field creation. Option D is wrong because `convert` is used for type conversion (e.g., converting strings to numbers or timestamps), not for conditional logic; the syntax `convert status=if(...)` is invalid and would produce an error.
Arrange the steps to create a new index in Splunk in the correct order.
Drag steps to the numbered slots on the right, or tap a step then tap a slot.
Why this order
Creating an index involves navigating to the indexes page, adding a new index with appropriate settings, and saving.
A Splunk administrator runs the following search to identify the top 5 users by total bytes transferred: index=proxy sourcetype=webproxy | stats sum(bytes) as total_bytes by user | sort - total_bytes | head 5 The search returns results, but the numbers seem inflated. On closer inspection, the 'bytes' field is a string type. What must be done to correct the search?
This explicitly converts the string to numeric, ensuring correct summation.
Why this answer
Option B is correct because the `bytes` field is stored as a string, and `stats sum()` cannot perform arithmetic on string values — it would silently treat them as zero or concatenate them, leading to inflated results. The `tonumber()` function explicitly converts the string to a numeric type, enabling accurate summation. Using `eval` to create a new numeric field before `stats` is the standard approach in Splunk for this scenario.
Exam trap
The trap here is that candidates assume `stats sum()` automatically converts strings to numbers, or they reach for `convert` (a non-existent command) instead of the correct `eval tonumber()` pattern, which Splunk explicitly tests in the Advanced Searching domain.
How to eliminate wrong answers
Option A is wrong because `convert num(bytes)` attempts to convert the field in place, but `convert` is not a valid Splunk command for this purpose; the correct command is `eval` with `tonumber()`. Option C is wrong because `where isnum(bytes)` filters out non-numeric values but does not convert the string to a number, so `stats sum()` would still fail to sum correctly (strings would be ignored or cause errors). Option D is wrong because `eval bytes = string(bytes)` explicitly converts the field to a string, which is the opposite of what is needed and would make the inflation worse.
Which command returns the list of all sourcetypes in a specific index?
`metadata` with `type=sourcetypes` lists all sourcetypes in the index.
Why this answer
Option D is correct because the `| metadata` command with `type=sourcetypes` retrieves a list of all sourcetypes present in a specified index, along with their earliest and latest timestamps. This command queries the index metadata directly, making it the appropriate tool for listing sourcetypes within a given index.
Exam trap
Splunk often tests the distinction between commands that return metadata summaries (`| metadata`) versus commands that return raw events or statistical aggregations, leading candidates to choose `| metasearch` or malformed `| sourcetype count` commands instead of the correct metadata approach.
How to eliminate wrong answers
Option A is wrong because `| sourcetype count` is not a valid SPL command; it appears to be a malformed attempt to use `| stats count by sourcetype`, which would count events per sourcetype but not list all sourcetypes in an index. Option B is wrong because `| eventtype count` is also not a valid command; eventtypes are saved searches or tags, not a direct way to list sourcetypes, and the syntax is incorrect. Option C is wrong because `| metasearch index=main sourcetype=*` is a valid search that returns events matching the pattern, but it does not return a list of distinct sourcetypes; it returns raw events, which is inefficient and not the intended output.
An analyst wants to remove events that contain the string 'debug' from a log. Which command should be used?
Negates the search term to exclude events
Why this answer
Option D is correct because the `| search NOT debug` command filters out all events containing the string 'debug' from the result set. In Splunk, the `NOT` operator before a search term excludes events that match that term, effectively removing them from the output. This is the standard way to exclude a specific string from search results.
Exam trap
The trap here is that candidates often confuse the placement of `NOT` in Splunk syntax, thinking it can be placed after the term like in natural language, or they mistakenly use `where` with regex functions when a simple `NOT` suffices.
How to eliminate wrong answers
Option A is wrong because `| where NOT match(_raw,"debug")` uses the `match` function which expects a regex pattern, not a literal string; it would treat 'debug' as a regex, potentially causing unexpected behavior or errors if the string contains regex metacharacters. Option B is wrong because `| search debug | reverse` first includes only events with 'debug', then reverses the order, which does not remove 'debug' events but instead keeps them and changes their display order. Option C is wrong because `| search "debug" NOT` has incorrect syntax; the `NOT` operator must be placed before the term it negates, not after, and this would likely result in a syntax error or unintended results.
A security analyst notices that a timechart command is returning too many data points on the x-axis, making the chart unreadable. Which command modification should be used to reduce the number of data points?
Span reduces data point granularity
Why this answer
The `timechart` command automatically bins events into time buckets based on the time range. By default, Splunk chooses a span that can result in many data points. Adding `span=1h` explicitly sets the bucket size to one hour, reducing the number of data points on the x-axis and making the chart readable.
Exam trap
The trap here is that candidates confuse options that control the number of series (like `limit` or `useother`) with options that control the number of time buckets (like `span`), leading them to pick a wrong answer that does not affect the x-axis density.
How to eliminate wrong answers
Option A is wrong because `partial=f` controls whether partial time buckets at the edges of the time range are displayed, not the number of data points. Option B is wrong because `useother=f` prevents grouping of low-count values into an 'Other' category, which affects the y-axis series, not the x-axis data points. Option D is wrong because `limit=5` restricts the number of series (e.g., top 5 hosts) shown, not the number of time buckets on the x-axis.
The exhibit shows a search that categorizes HTTP status codes and counts them. If the search returns only three categories, what is the most likely reason?
stats count by category only shows categories with non-zero counts unless usenull is specified.
Why this answer
Option D is correct because the `stats` command in Splunk, by default, only returns results for categories that have at least one event. If a category (e.g., a specific HTTP status code range) has zero matching events, it will not appear in the output. This is a common behavior in aggregation commands, where null or zero-count results are suppressed unless explicitly requested with the `usenull=f` or `fillnull` options.
Exam trap
Splunk often tests the default behavior of `stats` to omit zero-count groups, leading candidates to incorrectly assume that the `case` function is incomplete or that events are being filtered out, rather than recognizing that empty categories are simply not displayed.
How to eliminate wrong answers
Option A is wrong because the `stats` command does not filter out events with null category; it simply does not display categories with zero counts. The `case` function returns a null value for unmatched conditions, but `stats` counts those events under a null category only if `useother=t` or `usenull=t` is specified. Option B is wrong because a syntax error in the `case` function would cause the search to fail entirely or return an error, not truncate results to exactly three categories.
Option C is wrong because HTTP status codes above 599 are not valid per RFC 7231, and the `case` statement is not required to cover them; the question states the search returns only three categories, implying the `case` statement covers all valid codes, but zero events exist for some ranges.
A security analyst wants to create a comparison report showing the count of login failures by user for today versus yesterday. They run: `index=security action=failure | timechart count by user`. This produces a chart of counts over time, but they want separate columns for today and yesterday. How can they achieve this comparison efficiently?
Correctly categorizes events by day and creates separate columns.
Why this answer
Using `eval` to create a day label and then `timechart` with the user and day fields creates the desired side-by-side chart. Option A is incorrect because timechart does not have a 'useother' option for this. Option C works but is less efficient and may require manual time ranges.
Option D does not produce a time-based comparison.
When using the stats command with multiple BY fields, the results show many rows with null values. What is the most likely cause and how can it be reduced?
Prevents null groups from appearing
Why this answer
Option B is correct because the `stats` command includes null values in BY fields by default, which can produce many rows with nulls. Using `usenull=f` explicitly tells `stats` to ignore null values in the BY clause, reducing those rows. This parameter is specific to the `stats` command and directly addresses the root cause.
Exam trap
The trap here is that candidates often confuse `usenull=f` with post-processing filters like `where` or `fillnull`, not realizing that the null rows are generated during the `stats` aggregation itself and must be prevented at that stage.
How to eliminate wrong answers
Option A is wrong because the `where` command filters results after `stats` has already processed nulls, which does not reduce the number of rows generated by `stats`; it only hides them from the output. Option C is wrong because using `eval` to replace nulls before `stats` changes the data (e.g., replacing null with a placeholder like 'N/A'), which can alter statistical results and is not the intended way to handle nulls in BY fields. Option D is wrong because `fillnull` is used after `stats` to replace null values in output fields, not to prevent null rows from being created by the BY clause.
An analyst wants to calculate the average response time for each web server, but only for requests that returned status code 200. Which search accomplishes this?
Correct order: filter, then stats.
Why this answer
Option C is correct because it first filters events with `status=200` (only successful requests), then uses `stats avg(response_time) by host` to compute the average response time per web server. This ensures the aggregation is performed only on the relevant subset of data, matching the requirement precisely.
Exam trap
Splunk often tests the order of operations in Splunk searches, specifically that filtering (with `where` or search terms) must occur before aggregation (`stats`) to affect the computed values, and that `eval` cannot perform aggregate functions like `avg()`.
How to eliminate wrong answers
Option A is wrong because `sort host` before `stats` is unnecessary and does not affect the aggregation; more critically, `stats avg(response_time)` without a `by` clause computes a single overall average, not per host. Option B is wrong because `eval` cannot compute an aggregate function like `avg()` with a `by` clause; `eval` is for per-event calculations, not statistical aggregations, and the `where` clause is placed after the invalid `eval`. Option D is wrong because `stats avg(response_time) by host` is computed on all events (including non-200 status codes), and then `search status=200` attempts to filter after aggregation, but the `status` field is no longer present in the aggregated results, so the filter will return no results or be meaningless.
To count events by host for the last hour, which search is most efficient?
Applies time range early, minimizing data scanned.
Why this answer
Option A is correct because it uses `index=*` to search all indexes and `earliest=-1h` to restrict the search to the last hour at the index level, which is the most efficient way to filter time. The `stats count by host` then aggregates counts per host without needing to process events outside the time range. This approach leverages Splunk's time-based index pruning, minimizing data scanned.
Exam trap
Splunk often tests the misconception that you can filter time after aggregation (as in Option B) or that limiting results with `head` is equivalent to time-based filtering, when in fact time filters must be applied at search time via `earliest`/`latest` for efficiency and correctness.
How to eliminate wrong answers
Option B is wrong because it retrieves all events (no time filter) and then attempts to filter by `_time` after the `stats` command, which is inefficient and incorrect since `stats` discards the `_time` field unless explicitly retained; the `where` clause would fail or require reprocessing all data. Option C is wrong because `head 1000` arbitrarily limits results to the first 1000 events, which may not represent the last hour and can miss relevant data, making it both inefficient and inaccurate. Option D is wrong because `sourcetype=access_combined` restricts to a specific sourcetype, not all events, and `timechart count by host` is less efficient than `stats` for a simple count by host, as it creates time-based buckets unnecessarily.
A user wants to see the top 5 most common HTTP methods (field "method") from web access logs, along with their percentage of total. Which search is best?
Correctly uses top with showperc to display percentages.
Why this answer
Option C is correct because `top` with `limit=5` returns the five most common values of the `method` field, and `showperc=t` automatically calculates and displays each value's percentage of the total events. This directly meets the requirement to see the top 5 HTTP methods and their percentages without needing additional commands.
Exam trap
The trap here is that candidates often assume `top` only shows counts and not percentages, or they misuse `countfield` instead of `showperc`, leading them to choose a manual `stats` approach that omits the percentage calculation entirely.
How to eliminate wrong answers
Option A is wrong because `countfield=percent` is not a valid parameter for the `top` command; the correct parameter to display percentages is `showperc=t`. Option B is wrong because `eventstats count` adds a total count to every event, but `top` without `limit=5` defaults to showing 10 results, and it does not automatically calculate percentages unless `showperc=t` is used. Option D is wrong because while it correctly finds the top 5 methods by count, it does not calculate or display the percentage of total for each method, which the question explicitly requires.
Which TWO of the following are valid aggregation functions in the `stats` command? (Choose 2)
`earliest` is a valid stats function that returns the earliest value of a field.
Why this answer
The `stats` command in Splunk supports `earliest()` as an aggregation function that returns the earliest value of a field for each group. Option C is correct because `earliest()` is a valid stats function that retrieves the first occurrence of a field value within the search results, based on the order of events.
Exam trap
Splunk often tests the distinction between valid `stats` functions and those that are only available in `eventstats` or `streamstats`, such as `median()` and `mode()`, leading candidates to incorrectly select them for `stats`.
Refer to the exhibit. What will this search return?
timechart by host produces a time series chart with lines per host.
Why this answer
The search uses `timechart` with `by host`, which produces a time-based chart where each host is a separate series (line) showing the count of events where `status=404` over each time bucket. The `count` function aggregates the number of 404 events per time period, and the `by host` clause splits the results into separate lines per host. Option B correctly describes this output.
Exam trap
Splunk often tests the distinction between `timechart` (time-based series) and `chart` or `stats` (non-time-based aggregation), leading candidates to confuse a time-series chart with a static table or bar chart.
How to eliminate wrong answers
Option A is wrong because the search does not return a raw list of events; it aggregates counts over time using `timechart`, so individual events are not displayed. Option C is wrong because `timechart` produces a time-based chart (line or column) with time on the x-axis, not a table with rows for each time bucket and columns for each host; a table would require `chart` or `stats` with `by` and `span`. Option D is wrong because `timechart` with `by host` does not produce a bar chart of total counts per host; it shows counts over time, not a single aggregated total per host.
A search produces a field 'count'. You need to find the event with the maximum count. Which approach is correct?
This adds the maximum to each event and filters to those that equal the max.
Why this answer
Option A is correct because it uses `eventstats` to compute the maximum count across all events, storing it in a new field `maxcount`, and then filters the events where the original `count` equals that maximum. This approach preserves the full event data for the event(s) with the highest count, which is necessary when you need to retrieve the entire event, not just the aggregated value.
Exam trap
Splunk often tests the distinction between `eventstats` and `stats`, where candidates mistakenly think `stats` can be used to find the event with the maximum value, but `stats` collapses the data and loses the original event fields, making it unsuitable for retrieving the full event.
How to eliminate wrong answers
Option B is wrong because it is a meta-option that claims both B and C work, but Option D does not work for finding the event with the maximum count (it only returns the max value as a single row, losing the event context). Option C is wrong because while `| sort -count | head 1` does return the event with the highest count, it is not the only correct approach; Option A is also correct, and the question asks for 'which approach is correct' — both A and C are valid, but Option B incorrectly claims that both B and C work (B is not a valid approach itself). Option D is wrong because `| stats max(count) as maxcount` produces a single-row result with only the maximum count value, not the original event data, so you cannot identify which event had that count.
Refer to the exhibit. What is the purpose of the eval command in this search?
Correctly describes the eval case usage
Why this answer
The eval command creates a new field 'status_category' by evaluating a CASE expression that maps numeric HTTP status codes (e.g., 200, 404, 500) into three descriptive categories: 'OK', 'Client Error', and 'Server Error'. This is a common pattern for enriching raw data with human-readable labels without altering the original 'status' field. The correct answer is D because the search explicitly defines the new field based on the status code values.
Exam trap
Splunk often tests the distinction between creating a new field versus modifying an existing field, and candidates mistakenly think eval replaces the original field when it actually adds a new one.
How to eliminate wrong answers
Option A is wrong because the eval command does not replace the 'status' field; it creates a new field 'status_category' while leaving the original 'status' field intact. Option B is wrong because the new field 'status_category' is not temporary; it persists after the stats command since stats can aggregate over any existing fields, including those created by eval. Option C is wrong because the 'status' field is already a numeric type (as shown in the CASE comparisons with numbers), and eval does not convert it to a string; instead, it creates a new string field 'status_category' from the numeric values.
A search returns duplicate events for the same user. The analyst wants to keep only the first occurrence of each user based on timestamp. Which sequence of commands is best?
Sort ascending puts earliest first, then dedup keeps the first (earliest) per user.
Why this answer
Option D is correct because it first sorts events by timestamp in ascending order (oldest first), then applies `dedup user` to keep only the first occurrence of each user. Since `dedup` retains the first event it encounters for each field value, sorting by `_time` ensures that the earliest event for each user is kept, satisfying the requirement to keep only the first occurrence based on timestamp.
Exam trap
Splunk often tests the order of operations in piped commands, specifically that `sort` must precede `dedup` to control which event is kept, and that `-` before a field name reverses the sort order, which candidates may misinterpret.
How to eliminate wrong answers
Option A is wrong because `sort -_time` sorts in descending order (newest first), so `dedup user` would keep the most recent event for each user, not the first occurrence. Option B is wrong because `dedup user` without any sort operates on the raw order of events as they arrive from the index, which is not guaranteed to be chronological, so it may not keep the earliest event for each user. Option C is wrong because `dedup user` is applied before sorting, so the dedup operation sees events in their raw order and may discard the earliest event; the subsequent `sort _time` only reorders the remaining events but cannot recover the discarded first occurrence.
A user wants to add a field showing the average value of a numeric field `latency` for each host, without reducing the number of events. Which command should be used?
`eventstats` adds the average latency per host to each event without reducing the number of events.
Why this answer
The `eventstats` command is correct because it calculates aggregate statistics (like average) over a field and appends the result as a new field to every event, preserving the original event count. Unlike `stats`, which reduces the dataset to one row per group, `eventstats` enriches each event with the computed value without removing any events.
Exam trap
The trap here is that candidates often confuse `eventstats` with `stats` because both compute aggregates, but `stats` reduces events while `eventstats` does not, and Cisco tests this distinction by explicitly stating 'without reducing the number of events' in the question.
How to eliminate wrong answers
Option A is wrong because `eval` creates or modifies fields on a per-event basis using expressions, but it cannot compute aggregate statistics like an average across multiple events. Option B is wrong because `stats` computes aggregate statistics but reduces the number of events to one row per group (e.g., per host), which violates the requirement to keep all events. Option D is wrong because `streamstats` computes running or cumulative statistics over a sequence of events, not a global average per host, and it would produce incorrect results if events are not sorted properly.
The search returns unexpected results, including IP addresses that are not in the expected format (e.g., '127.0.0.1' appears as '27.0.0.1'). What is the most likely cause?
If the raw contains something like '127.0.0.1' preceded by a digit, the regex might match a subset. But more likely, rex extracts first occurrence; if IP is part of a larger string, it might be incomplete.
Why this answer
Option D is correct because the `rex` command, by default, extracts only the first match of a regex pattern from each event. If an event contains multiple IP addresses, `rex` captures the first occurrence, which may be truncated if the regex pattern is not anchored properly or if the IP appears in a context where leading digits are separated (e.g., '127.0.0.1' might be preceded by a character that causes the regex to match starting at '27.0.0.1'). This is a common behavior in Splunk when using `rex` without the `max_match` parameter.
Exam trap
Splunk often tests the misconception that `rex` extracts all matches by default, leading candidates to overlook the need for `max_match` or proper regex anchoring when dealing with multiple values in a single event.
How to eliminate wrong answers
Option A is wrong because using `\b` for word boundaries would not fix the issue of extracting a truncated IP; the problem is about the first match being incomplete, not about boundary detection. Option B is wrong because the `top` command aggregates counts of field values and does not modify the extracted `ip` field itself; it only displays frequencies. Option C is wrong because the `rex` command can be placed anywhere in the search pipeline after the initial data retrieval; it does not need to be before the index search, and placing it earlier would not change the extraction behavior.
Which THREE of the following are true about the `transaction` command? (Choose 3)
You must specify at least one field in the `by` clause.
Why this answer
Option C is correct because the `by` clause in the `transaction` command is mandatory. It defines the grouping criteria (e.g., `by user`, `by session_id`) that determine which events belong to the same transaction. Without a `by` clause, the command would attempt to group all events into a single transaction, which is rarely useful and often leads to incorrect results.
Exam trap
Splunk often tests the misconception that `startswith` operates on field values, when in fact it operates on raw event text, and that `transaction` outputs one event per input event rather than one event per transaction.
Which of the following searches correctly computes the average response time per host?
`mean()` is an alias for `avg()` and correctly computes the average per host.
Why this answer
Option A is correct because the `stats` command with `mean(response_time)` calculates the arithmetic mean of the response_time field, and the `by host` clause groups the calculation per host, producing the average response time for each host. This is the standard Splunk syntax for computing averages in a grouped statistics table.
Exam trap
The trap here is that candidates may confuse `eventstats` with `stats` or use incorrect function names like `average`, leading them to choose options that either do not produce a summary table or use invalid Splunk syntax.
How to eliminate wrong answers
Option B is wrong because `average` is not a valid stats function in Splunk; the correct function name is `avg` or `mean`. Option C is wrong because `eventstats` adds the computed value as a new field to each event rather than producing a summary table, so it does not return a distinct list of hosts with their average response times. Option D is wrong because the syntax `stats avg response_time by host` is missing parentheses around the field name; Splunk requires `avg(response_time)` to correctly parse the function argument.
A developer wants to debug a slow Splunk search that uses multiple eval and where commands. The search returns correct results but takes 2 minutes. The developer wants to identify which parts of the search are slow. The environment is a single instance Splunk with moderate data. What should the developer do?
Provides per-command timing information.
Why this answer
Option C is correct because the Search Job Inspector provides detailed per-command execution statistics, including time spent, number of results, and memory usage for each pipe segment. This allows the developer to pinpoint exactly which `eval` or `where` command is causing the slowdown, without altering the search logic or time range.
Exam trap
The trap here is that candidates confuse the Job Manager (which shows high-level job status) with the Search Job Inspector (which provides granular per-command profiling), or mistakenly believe that reducing the time range or adding comments will help identify performance bottlenecks.
How to eliminate wrong answers
Option A is wrong because the Job Manager only shows overall job metadata (e.g., total run time, result count, disk usage) and does not break down performance per search command. Option B is wrong because limiting the time range to 1 minute changes the dataset size and may mask the actual slow command; it also does not provide per-command timing. Option D is wrong because comments are ignored by the search parser and have no effect on performance measurement; they do not generate any timing or profiling data.
Refer to the exhibit. Which statement about this search is true?
Correct interpretation of the search
Why this answer
The search uses `iplocation` to add geographical fields (like Country, City) based on the `src_ip` field, then renames `src_ip` to `src` and uses `stats` to aggregate bytes by `dest_ip` and the newly added `Country` field. This matches option D exactly.
Exam trap
The trap here is that candidates often confuse which IP address (source vs. destination) is being geolocated, or assume `iplocation` filters out invalid IPs, when in fact it only enriches events without removing any.
How to eliminate wrong answers
Option A is wrong because `iplocation` does not require a predefined lookup table; it uses a built-in MaxMind GeoIP database. Option B is wrong because the search applies `iplocation` to `src_ip`, not the destination IP (`dest_ip`). Option C is wrong because `iplocation` does not filter events; it only adds geographical fields to events that have a valid IP in `src_ip`, but events with invalid IPs are not excluded from the search results.
Which TWO of the following commands can be used to find the most frequent value of a field within each group?
stats mode returns the mode for each group.
Why this answer
Option A is correct because `stats mode(field) by group` directly computes the most frequent value (mode) of the specified field for each group defined by the `by` clause. The `mode()` function is specifically designed to return the value that appears most often, making it the simplest and most accurate command for this task.
Exam trap
The trap here is that candidates often confuse `list()` or `values()` with `mode()`, or incorrectly think `streamstats` can replace `stats` for grouped final aggregation, when `streamstats` is designed for cumulative calculations across events, not per-group final results.
An analyst wants to create a time-series comparison of the current week and the previous week. Which TWO commands are commonly used together to achieve this? (Select two.)
Generates time-series data
Why this answer
B is correct because `timechart` is the primary command for creating time-series aggregations, allowing you to split data into time buckets and apply statistical functions. D is correct because `timewrap` is specifically designed to compare time periods (e.g., current week vs. previous week) by wrapping the time-series data into separate series for each period, enabling side-by-side visualization.
Exam trap
Splunk often tests the misconception that `stats` or `eventstats` can replace `timechart` for time-based comparisons, but only `timechart` provides the necessary time-bucketing, and `timewrap` is the dedicated command for period-over-period wrapping.
A search needs to find events where the same user logged in from more than 3 different IP addresses within a 5-minute window. Which combination of commands is most efficient?
Efficiently groups events by user within a 5-minute window and then counts distinct IP addresses.
Why this answer
Option D is correct because the `transaction` command groups events by `user` within a 5-minute window (`maxspan=5m`), then `eval distinct_ip=mvcount(src_ip)` counts the unique IP addresses in that transaction. This directly answers the requirement of finding users who logged in from more than 3 different IPs within a 5-minute window, and it is efficient because `transaction` handles the time-bounded grouping natively without needing to pre-aggregate or use subsearches.
Exam trap
The trap here is that candidates often choose `streamstats` or `stats` because they are familiar with counting, but they fail to realize that those commands count events per user+IP pair rather than distinct IPs per user within a time window, which is the core requirement.
How to eliminate wrong answers
Option A is wrong because `streamstats` with `count by user src_ip` counts occurrences of each user+src_ip pair, not distinct IPs per user; it would require a user to have more than 3 events from the same IP, which is not the requirement. Option B is wrong because `timechart` with `values(src_ip) by user` creates a time-based chart that can miss events if the time range is not perfectly aligned to 5-minute buckets, and it is less efficient due to the need to generate a table and then evaluate `mvcount`. Option C is wrong because `stats count by user, src_ip` counts events per user+IP pair, not distinct IPs per user within a time window; it would require a user to have more than 3 events from the same IP, and it ignores the 5-minute window entirely.
A search is producing results that include both internal and external traffic. The analyst wants to approximate the number of distinct destination IPs for internal traffic only, where internal IPs fall within the 10.0.0.0/8 range. Which approach is most efficient?
Efficient subnet matching with cidrmatch
Why this answer
Option D is correct because it uses `where cidrmatch("10.0.0.0/8", src_ip)` to efficiently filter events to only those with source IPs in the 10.0.0.0/8 range before passing them to `stats dc(dest_ip)`. This approach leverages Splunk's built-in CIDR matching function, which performs a bitwise comparison on the IP address, and applies the filter early in the pipeline, reducing the dataset for the distinct count operation. It is the most efficient as it avoids unnecessary evaluations or string operations on non-matching events.
Exam trap
The trap here is that candidates often choose Option C because they think `eval` with `by` is equivalent to filtering, but they overlook that it processes all events and computes an unnecessary group for external traffic, making it less efficient than a simple `where` filter.
How to eliminate wrong answers
Option A is wrong because `src_ip=10.*` uses a wildcard string match, which is inefficient and can match IPs like 10.0.0.1 but also incorrectly match IPs like 100.0.0.1 or 10.0.0.256 (if present), and it does not respect the subnet mask of /8; it also does not filter out external traffic before the stats command. Option B is wrong because using `rex` to extract the first octet and then filtering requires an extra parsing step and still only checks the first octet (e.g., 10.x.x.x), which does not guarantee the IP is within the 10.0.0.0/8 range (e.g., 10.255.255.255 is valid, but a simple first-octet check would also include 10.0.0.0/8 correctly, but it is less efficient and more error-prone than CIDR matching). Option C is wrong because while it uses `cidrmatch` correctly, it creates a field `internal` for every event and then uses `stats dc(dest_ip) by internal`, which computes distinct counts for both internal=1 and internal=0, wasting resources on external traffic; the analyst only wants internal traffic, so filtering with `where` is more efficient than grouping and discarding the external group.
A search analyst wants to calculate the average transaction time for each user and then find users whose average transaction time exceeds the overall average. Which approach is most efficient?
Efficient: stats reduces data, eventstats adds overall average.
Why this answer
Option D is correct because it first uses `stats by user` to compute per-user average transaction times, then uses `eventstats` to append the overall average across all users to each row, allowing a direct `where` comparison. This approach is efficient because `eventstats` adds the global aggregate without requiring a separate subsearch or additional data pass, minimizing resource usage.
Exam trap
Splunk often tests the distinction between `eventstats` and `appendpipe`, where candidates mistakenly choose `appendpipe` thinking it adds a global aggregate, but it actually runs a subsearch that is less efficient and can produce incorrect results if not used carefully.
How to eliminate wrong answers
Option A is wrong because using `eventstats` before `stats by user` would compute the overall average on raw events, not on per-user averages, leading to an incorrect comparison. Option B is wrong because `appendpipe` runs a subsearch that re-scans the entire dataset, which is inefficient and redundant compared to using `eventstats` in a single pass. Option C is wrong because `transaction` is designed to group events into transactions based on session IDs or time windows, not to compute per-user averages efficiently, and it consumes significant memory and processing overhead.
A search using `tstats` to query a data model returns results but is slow. Which of the following is the most likely cause?
Without acceleration, tstats runs against the raw data and can be slow.
Why this answer
When a data model is accelerated, Splink pre-computes and stores summaries of the data in a TSIDX index, allowing `tstats` to query these summaries very quickly. If the data model is not accelerated, `tstats` must scan the raw data in the index, which is significantly slower. Therefore, the most likely cause of slow `tstats` performance is that the data model lacks acceleration.
Exam trap
Splunk often tests the misconception that `tstats` always uses acceleration or that a `where` clause on a non-indexed field is the primary cause of slowness, when in fact the absence of acceleration is the most common and impactful reason for poor `tstats` performance.
How to eliminate wrong answers
Option A is wrong because a data model with many fields can slow down acceleration or search, but `tstats` queries the accelerated summary (TSIDX) which is optimized for many fields; the primary performance bottleneck is the lack of acceleration, not field count. Option C is wrong because a `where` clause on a non-indexed field would not affect `tstats` performance when querying an accelerated data model, as `tstats` operates on the TSIDX index where all fields are indexed; the slowness is due to the absence of acceleration, not the `where` clause. Option D is wrong because `tstats` can use either `from` (to reference a data model) or `index` (to reference a raw index), and using `from` is the correct syntax for querying a data model; the slowness is not caused by using `from` but by the data model not being accelerated.
A search includes `... | eval day=strftime(_time, "%A") | stats count by day | sort count`. The results show Monday has the highest count. The analyst wants to confirm that the timezone is correctly applied. Which command should be added before the eval to ensure the day calculation uses the local timezone?
Correct: adjusting _time by timezone offset before extracting day.
Why this answer
Option C is correct because the `strftime` function uses the server's timezone by default, which may not match the local timezone. By manually adding the timezone offset (in seconds) to `_time` before the `eval`, you shift the epoch timestamp to reflect the local time, ensuring that `strftime` calculates the correct day of the week. This is a common workaround when the search head's timezone differs from the user's local timezone.
Exam trap
Splunk often tests the misconception that `strftime` automatically respects the user's local timezone, when in fact it uses the search head's timezone setting, requiring manual offset adjustment for accurate local-time calculations.
How to eliminate wrong answers
Option A is wrong because `strptime` is used to parse a string into an epoch timestamp, not to format a timestamp into a day name; using it here would cause an error or incorrect results. Option B is wrong because `fields + _time, day` only retains those fields and does not adjust the timezone; it does not affect how `strftime` interprets the timestamp. Option D is wrong because `convert ctime(_time)` converts the epoch timestamp to a human-readable string (ctime format), but does not change the underlying timezone applied by `strftime`; it would break the subsequent `strftime` call.
Option E is wrong because `relative_time(_time, "-0@d")` truncates the timestamp to the start of the current day (midnight) without any timezone offset, so it does not correct for timezone differences and may shift the day incorrectly.
Which THREE of the following are valid ways to create a subsearch in SPL? (Choose three.)
map runs a search for each result, effectively a subsearch.
Why this answer
Option B is correct because the `map` command in SPL allows you to run a subsearch for each result of the outer search, using field values from the outer result (e.g., `$field$`) to dynamically construct the inner search. This is a valid way to create a subsearch that iterates over search results, making it a legitimate subsearch pattern in Splunk.
Exam trap
Splunk often tests the distinction between commands that use subsearches (like `append`, `join`, `map`) versus commands that are not valid subsearch syntax (like `return`), and candidates may mistakenly think `return` is a valid subsearch command because it sounds similar to `search` or `output`.
A user runs a search that returns 1,000,000 results but only sees 5,000 in the Statistics tab. What is the most likely cause?
Without by, stats collapses all events into one row per function.
Why this answer
Option B is correct: the stats command without a 'by' clause aggregates all events into a single row (or by whatever field specified). If no 'by' clause, it returns one row per aggregation, so a small number of rows. Option A is wrong because the search command truncates at 50,000 results by default.
Option C is wrong because time range narrowness would reduce raw events, but here stats shows few rows. Option D is wrong because sampling is not a default behavior.
A search returns events with fields 'user', 'action', and 'count'. The analyst wants to create a timechart showing the number of distinct users performing 'login' actions per hour. Which search is correct?
Correct: timechart with distinct count of user per hour.
Why this answer
Option C is correct because `timechart span=1h dc(user)` computes the distinct count of the 'user' field per 1-hour time bucket, which directly answers the requirement of showing the number of distinct users performing 'login' actions per hour. The `dc()` function in Splunk is the distinct count function, and `timechart` automatically groups events by `_time` into the specified span.
Exam trap
The trap here is that candidates often confuse `dc(user)` (distinct count of users) with `count by user` (count of events per user), leading them to pick option D or E, which answer a different question.
How to eliminate wrong answers
Option A is wrong because `stats dc(user) by _time span=1h` does not use `timechart`, so it will not produce a timechart visualization; it returns a table of distinct user counts per time bucket but lacks the timechart formatting and binning behavior. Option B is wrong because `dc(by user)` is invalid syntax; `dc()` takes a single field argument, not a `by` clause. Option D is wrong because `eval user=user` is redundant and `timechart span=1h count by user` computes the count of events per user, not the distinct count of users.
Option E is wrong because `sum(count) by user` sums the 'count' field per user, which gives total login counts per user, not the number of distinct users.
A data scientist wants to extract the domain from email addresses in the `_raw` field. The emails follow the pattern user@domain.tld. Which eval expression should be used to create a new field called `domain` containing only the domain part?
Splits on '@' and takes the second part (index 1) which is the domain.
Why this answer
Option A is correct because `split(email,"@")` creates a multivalue field with two parts: the username (index 0) and the domain (index 1). `mvindex(...,1)` extracts the second element, which is the domain. This is the most direct and efficient way to isolate the domain from an email address in Splunk's eval expression.
Exam trap
The trap here is that candidates often confuse the zero-based index of `mvindex` (thinking index 1 is the username) or incorrectly assume `replace` with a regex is the most straightforward approach, when in fact `split` with `mvindex` is the simplest and most reliable method for this exact pattern.
How to eliminate wrong answers
Option B is wrong because `mvindex(...,0)` extracts the username (the part before `@`), not the domain. Option C is wrong because `replace(email,".*@(.*)","\1")` uses a regex that is greedy and may not correctly capture the domain in all cases (e.g., if the email contains multiple `@` symbols or special characters), and `replace` is not the idiomatic Splunk function for this extraction. Option D is wrong because `substr(email, indexof(email,"@")+1)` would extract everything after the `@`, including any trailing whitespace or newline characters, and does not handle cases where the `@` is missing (returns an empty string or error).
Which TWO of the following are valid ways to calculate the median of a numeric field?
eventstats median adds the median value to each event.
Why this answer
Option B is correct because `eventstats median(field)` computes the median of the specified field and adds it as a new field to every event, which is a valid way to calculate the median. Option E is correct because `stats median(field)` directly computes the median of the numeric field and returns a single result, which is the standard method for median calculation in Splunk.
Exam trap
Splunk often tests the distinction between `eval` and `stats` functions, and candidates mistakenly use `eval` with aggregation functions like `percentile` or confuse the syntax for percentile commands (e.g., `perc`, `p50`) with the correct `perc50` or `percentile` syntax.
A security team runs a search to count login failures per user over the last 24 hours: `index=security action=failure | stats count by user`. The results show counts, but some users have extremely high counts due to a brute force attack. The team wants to identify users with a count greater than 100. What should they do to get the desired list?
Correctly filters the stats results by the count field.
Why this answer
Option B is correct because the `stats count by user` command creates a field called `count` that holds the number of login failures per user. Adding `| where count > 100` after the stats command filters the results to show only users whose count exceeds 100. The `where` command evaluates field values in the current results, making it the appropriate tool for this post-aggregation filter.
Exam trap
Splunk often tests the distinction between filtering before aggregation (using `search` or `where` on raw events) versus filtering after aggregation (using `where` on computed fields), and candidates mistakenly place the filter before `stats` or use a nonexistent command like `filter`.
How to eliminate wrong answers
Option A is wrong because `| top limit=100 user` returns the top 100 users by count, not users with a count greater than 100; it does not apply a threshold filter. Option C is wrong because placing `| where count > 100` before the stats command would attempt to filter on a field `count` that does not yet exist, causing an error or no results. Option D is wrong because `filter` is not a valid Splunk command; the correct command for filtering results is `where`, not `filter`.
A Splunk admin is responsible for a search dashboard that displays real-time statistics of application errors. The search uses 'index=app sourcetype=error | timechart count by severity span=5m'. Users report that the dashboard is slow and often times out. The environment has 4 indexers and the data volume is about 500 GB/day. The admin wants to improve performance without changing the dashboard's output. Which step should they take?
Reduces the amount of data scanned in real time.
Why this answer
Option B is correct because adding a summary index that precomputes the counts by severity and using that in the dashboard reduces the real-time data scan. Option A would not help as it only benefits ad-hoc searches. Option C reduces the number of events but also changes the output (fewer severities).
Option D uses streaming commands which may not reduce disk I/O significantly.
A search returns 1000 results per second. The user wants to see a trend of counts over the past hour in 5-minute intervals. Which command should be used?
`timechart` with `span=5min` correctly creates a time series of event counts per 5-minute bucket.
Why this answer
The `timechart` command is designed to create a time-based chart with automatic binning of events into time buckets. By specifying `span=5min`, you explicitly set the bucket size to 5-minute intervals, and `count` calculates the number of events per bucket. This directly satisfies the requirement to see a trend of counts over the past hour in 5-minute intervals.
Exam trap
Splunk often tests the misconception that `stats` or `chart` can be used with a `span` parameter to create time-based buckets, when in fact only `timechart` (and `bucket` in conjunction with `stats`) supports this syntax for time aggregation.
How to eliminate wrong answers
Option B is wrong because `chart count over _time span=5min` is not valid syntax; `chart` does not support the `span` option and requires a `by` clause to split data, making it unable to produce time-based buckets. Option C is wrong because `stats count by _time span=5min` is invalid; `stats` does not accept a `span` keyword, and grouping by raw `_time` would create a separate count for each unique timestamp, not aggregated intervals. Option D is wrong because `streamstats count span=5min` is invalid; `streamstats` computes running or sliding window statistics and does not support a `span` parameter, nor does it bin events into time intervals.
The exhibit shows a search to find the top 5 URI-method combinations by count. However, the results show only 5 rows, but the analyst expected to see the top 5 URIs overall, not combinations. Which change to the search would achieve the desired result?
Correct: grouping only by uri gives count per URI.
Why this answer
Option E is correct because the original search uses `stats count by uri, method`, which groups results by both URI and method, producing separate counts for each combination. Changing it to `stats count by uri` removes the method field from the grouping, so the count is aggregated per URI alone, giving the top 5 URIs overall as the analyst expected.
Exam trap
Splunk often tests the distinction between grouping by multiple fields versus a single field, and the trap here is that candidates may think they need an additional stats command (Option D) or a filter (Option A) when simply removing the extra field from the `by` clause is the correct and efficient fix.
How to eliminate wrong answers
Option A is wrong because adding `| where method="GET"` would filter to only GET requests, which does not aggregate across all methods and still groups by URI and method if the stats clause remains unchanged. Option B is wrong because `chart count over uri by method` creates a tabular breakdown of counts per method for each URI, not a single count per URI, and still separates by method. Option C is wrong because `top limit=5 uri, method` returns the top 5 URI-method combinations by count, which is exactly what the original search does, not the top 5 URIs overall.
Option D is wrong because adding `| stats sum(count) as total by uri` after the existing stats would sum the counts for each URI, but the preceding stats already produced separate rows per combination; this would work only if the first stats output is properly structured, but it is an unnecessary extra step when simply removing `method` from the first stats is cleaner and more direct.
A user wants to create a report that shows the top 5 sources of errors, excluding a specific source 'host1'. Which SPL is correct?
Correctly excludes host1 before top, ensuring accurate top 5.
Why this answer
Option A is correct because it filters out 'host1' before the `top` command runs, ensuring that the top 5 sources of errors are calculated from the remaining data. The `NOT host="host1"` clause is placed in the base search, which is the most efficient approach and guarantees that 'host1' is excluded from the statistical aggregation.
Exam trap
Splunk often tests the misconception that filtering after a transforming command like `top` is equivalent to filtering before it, when in reality the aggregation is performed on the entire dataset first, altering the results.
How to eliminate wrong answers
Option B is wrong because the `where` command is applied after `top`, which means the top 5 sources are computed including 'host1', and then 'host1' is removed from the result set; this could leave fewer than 5 results and does not exclude 'host1' from the ranking calculation. Option C is wrong because the `search` command after `top` also filters after the aggregation, suffering from the same issue as Option B, and additionally `search source!="host1"` incorrectly uses the field `source` instead of `host` to filter the host. Option D is wrong because the `search NOT host=host1` is placed after the base search but before `top`, which would work logically, but the syntax is incorrect: `search NOT host=host1` is not valid SPL (the correct syntax is `NOT host="host1"` or `host!="host1"`), and the command is redundant since the base search already has the same filter; however, the primary flaw is that the `search` command is unnecessary and the syntax error makes it invalid.
Which THREE of the following are benefits of using eventstats over stats when analyzing event logs? (Choose three.)
eventstats does not reduce event count.
Why this answer
Option A is correct because `eventstats` adds aggregate statistics (like sums or averages) to each original event without reducing the total number of events. Unlike `stats`, which collapses events into a single summary row per group, `eventstats` appends the aggregated value to every matching event, preserving the original event count and structure.
Exam trap
The trap here is that candidates confuse `eventstats` with `stats`, assuming `eventstats` is always faster or more memory-efficient, when in fact it trades off performance and memory for the ability to retain original event context.
To find users who logged in from more than 3 different IP addresses, which search is correct?
dc counts distinct IPs per user, then filters.
Why this answer
Option A is correct because it uses `stats dc(IP) by user` to count distinct IP addresses per user, then filters with `where dc(IP) > 3` to return only users who logged in from more than 3 different IPs. The `dc()` function calculates distinct count, which is exactly what the question requires.
Exam trap
Splunk often tests the distinction between `dc()` (distinct count) and `count` (total occurrences), and the trap here is that candidates may confuse `distinct_count()` (invalid) with `dc()` or think `dedup` followed by `count` achieves the same result, which it does not because it counts duplicates of the pair rather than distinct IPs per user.
How to eliminate wrong answers
Option B is wrong because `top limit=3 IP by user` returns the top 3 IP addresses per user, not a count of distinct IPs, and cannot filter for users with more than 3 distinct IPs. Option C is wrong because `eval user, IP` is invalid syntax (eval requires an expression), and `dedup user, IP` removes duplicate pairs but does not count distinct IPs per user correctly; the subsequent `stats count` counts occurrences, not distinct IPs. Option D is wrong because `distinct_count(IP)` is not a valid SPL function; the correct function is `dc(IP)`, and this search would produce an error.
Ready to test yourself?
Try a timed practice session using only Advanced Searching and Statistics questions.