Knowledge + Practice

Splunk Core Certified Power User SPLK-1003 (SPLK-1003) — Questions 1–75

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

MCQhard

A Splunk environment ingests 10 TB per day. A user runs a search to count events per sourcetype over the last 7 days: `index=* earliest=-7d | timechart count by sourcetype`. The search returns partial results and eventually times out. The user needs to obtain the complete results efficiently. What is the best course of action?

A.Use `| bucket span=1d | stats count by _time sourcetype` then `| xyseries` to format.

B.Use `| sitime` to sample the data and approximate counts.

C.Use `| tstats count where index=* earliest=-7d by _time span=1d, sourcetype` and then format as needed.

D.Break the search into 1-day intervals and use `append` to combine results.

AnswerC

tstats leverages acceleration and is faster for large data volumes.

Why this answer

Option C is correct because `tstats` runs on indexed metadata (tsidx files) rather than raw events, making it far more efficient for counting events over large time ranges. By specifying `by _time span=1d, sourcetype`, you get daily counts per sourcetype without scanning the entire event data, avoiding the timeout that occurs with a raw search over 10 TB/day for 7 days.

Exam trap

Splunk often tests the distinction between raw event searches and metadata-based searches, and the trap here is that candidates may not realize `tstats` can aggregate by sourcetype and time span without touching raw data, leading them to choose inefficient raw-search options like A or D.

How to eliminate wrong answers

Option A is wrong because `bucket span=1d | stats count by _time sourcetype` still requires scanning all raw events from the index, which is inefficient and will likely time out on 70 TB of data. Option B is wrong because `sitime` is not a valid Splunk command; it appears to be a distractor, and sampling would not provide complete results as required. Option D is wrong because breaking the search into 1-day intervals and using `append` still requires scanning raw events for each interval, leading to the same performance issues and potential timeout, plus it adds overhead from multiple searches.

Full explanation →

2

MCQeasy

The exhibit shows a search that reads a lookup file. Which of the following must be true for this search to work correctly?

A.The lookup must be defined in transforms.conf

B.The lookup file must be stored on the indexer

C.The file must be in the default lookup directory

D.The file server_status.csv must be in the $SPLUNK_HOME/etc/apps/search/lookups directory

AnswerA

inputlookup requires a lookup definition in transforms.conf.

Why this answer

For a lookup to work in a Splunk search, it must be defined in transforms.conf. This configuration file specifies the lookup type (e.g., file-based, KV-store, external), the filename, the field mapping, and other parameters. Without this definition, Splunk cannot resolve the lookup command or the lookup table reference in the search string, even if the file exists on disk.

Exam trap

The trap here is that candidates often assume the lookup file just needs to exist on disk (options B, C, D), but Splunk requires the explicit transforms.conf definition to map the lookup name to the file and fields.

How to eliminate wrong answers

Option B is wrong because lookup files are stored on the search head, not the indexer; indexers handle data indexing and search, but lookups are resolved on the search head. Option C is wrong because the lookup file does not have to be in the default lookup directory; it can be in any directory specified by the 'filename' parameter in transforms.conf, as long as Splunk has read access. Option D is wrong because the file does not have to be in the $SPLUNK_HOME/etc/apps/search/lookups directory; it can be in any app's lookups subdirectory (e.g., $SPLUNK_HOME/etc/apps/myapp/lookups) as long as the transforms.conf in that app references it correctly.

Full explanation →

3

MCQhard

A search includes a subsearch that returns 100,000 results, causing performance issues. Which optimization is best?

A.Use limit in the subsearch to return fewer results

B.Use the fields command inside the subsearch

C.Use the format command inside the subsearch

D.Use the search command with index=* inside the subsearch

AnswerA

limit reduces the number of results, improving performance.

Why this answer

Option A is correct because using the `limit` command in a subsearch restricts the number of results returned to the primary search, directly reducing the data volume that must be processed and joined. This is the most effective optimization when a subsearch returns a large result set (e.g., 100,000 events), as it minimizes memory and CPU overhead in the search head.

Exam trap

The trap here is that candidates often confuse reducing field count (fields command) with reducing row count, or think that formatting (format) or widening the search (index=*) will somehow improve performance, when only limiting the actual number of results addresses the root cause.

How to eliminate wrong answers

Option B is wrong because the `fields` command only selects a subset of fields from the results, but does not reduce the number of events returned; the subsearch still returns 100,000 results, so performance issues persist. Option C is wrong because the `format` command changes the output format of the subsearch results (e.g., into a boolean expression), but does not reduce the result count; it is used for formatting, not optimization. Option D is wrong because using `search index=*` inside the subsearch would search all indexes, likely returning even more results and worsening performance; it does nothing to limit the result set size.

Full explanation →

4

MCQeasy

You are a Splunk administrator at a large e-commerce company with over 5,000 employees and millions of customers. The development team has created a dashboard that displays sales data by region, using a lookup table to map customer IDs to region names. The lookup file, 'customer_region.csv', is stored on the search head. Recently, the lookup table was updated with new customer IDs, but the dashboard continues to show old region names for new customers. You have verified that the lookup file contains the new mappings and that the file is correctly formatted. The dashboard uses the 'lookup' command in its base search. You have also confirmed that the lookup definition in transforms.conf points to the correct file. The lookup file is approximately 100 MB and is updated weekly. The dashboard is accessed by multiple users across the organization. The issue only affects new customers added in the latest update. Old customers still show correct regions. You have checked the file size and timestamp, and the new file is present. The Splunk version is 8.2. The search head is not clustered. No errors are appearing in the splunkd.log related to lookups. The dashboard uses a simple XML with a timechart and a lookup. The search string is: index=sales sourcetype=transactions | lookup customer_region.csv customer_id OUTPUT region | timechart count by region. You have also tried restarting the search head, but the issue persists. What is the most likely cause?

A.The lookup definition has 'batch_index_query=True' and is not refreshing.

B.The dashboard is using the wrong lookup name.

C.The lookup file is cached and needs to be reloaded by restarting Splunk.

D.The search head is using a cached version of the lookup, and you need to clear the lookups cache.

AnswerD

Clearing cache reloads the file.

Why this answer

Option B is correct because the lookup file is cached on the search head, and clearing the cache forces reload. Option A is wrong because restarting is unnecessary. Option C is wrong because batch_index_query is not relevant.

Option D is wrong because the file is correct.

Full explanation →

5

Multi-Selecthard

In a Splunk environment, an analyst is using the transaction command to group events from different sources. Which THREE factors are most important to consider when designing the transaction search for optimal performance? (Choose three.)

Select 3 answers

A.Use the 'mvlist' option to store multiple values.

B.Use a large maxevents value to ensure all events are captured.

C.Apply efficient search-time field extractions to avoid using the transaction command across unindexed fields.

D.Limit the time range of the search using maxspan.

E.Use fields with low cardinality for grouping.

AnswersC, D, E

Correct: Improves search performance.

Why this answer

Options B, D, and E are correct. Low cardinality fields reduce open transactions, maxspan narrows the time window, and efficient field extractions avoid heavy operations. Option A (large maxevents) hurts performance, Option C (mvlist) is not a standard option.

Full explanation →

6

MCQeasy

An analyst needs to add a field called 'Region' to events based on a lookup table that maps 'StoreID' to 'Region'. The lookup table is defined in transforms.conf as a CSV lookup. Which command should be used in the search to perform this enrichment?

A.inputlookp

B.outputlookp

C.table

D.lookup

AnswerD

lookup matches fields and adds output fields to events.

Why this answer

The lookup command enriches events with fields from a lookup table. inputlookp reads a lookup as results, outputlookp writes events to a lookup, others are irrelevant.

Full explanation →

7

MCQeasy

An analyst wants to ensure that a transaction is only considered complete when it contains a specific end event. Which transaction parameter should be used?

A.startswith

B.endswith

C.maxpause

D.maxspan

AnswerB

Correct: endswith specifies the closing event.

Why this answer

Option D is correct because endswith defines the event that closes a transaction. Option A (startswith) defines the start event. Option B (maxspan) bounds total time.

Option C (maxpause) bounds idle time.

Full explanation →

8

Matchingmedium

Match each Splunk search command to its primary function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Calculates aggregate statistics on search results

Extracts fields using regular expressions

Creates or modifies fields using expressions

Groups events into transactions based on common fields

Enriches events with external data from a lookup table

Why these pairings

These are common Splunk search commands used for data manipulation and enrichment.

Full explanation →

9

MCQmedium

A Splunk administrator is tuning a dashboard that uses `transaction` to correlate web server events. The dashboard frequently times out. The admin reviews the search and sees `transaction client_ip maxspan=1h maxpause=30m`. The dataset contains about 10 million events per hour. The admin suspects that the transaction is causing the timeout. Which action should they take to improve performance while still achieving the grouping?

A.Replace transaction with streamstats to create a session ID, then use stats to aggregate

B.Add `maxevents=100` to limit events per transaction

C.Reduce maxspan to 15m and maxpause to 5m

D.Increase the search job concurrency

AnswerA

streamstats can process events sequentially and assign IDs, then stats can group without the full overhead of transaction.

Why this answer

The current transaction has generous limits. Using `transaction` on a large dataset is memory-intensive. A better approach is to pre-aggregate using `stats` or use `streamstats` to compute session boundaries.

Option D is the most practical: use `streamstats` to assign session IDs and then use `stats` to group, which is more efficient.

Full explanation →

10

Multi-Selecthard

Which THREE of the following are valid use cases for the `transaction` command in Splunk?

Select 3 answers

A.Identifying a sequence of events that indicate a brute-force attack (multiple failed logins followed by a success).

B.Generating an alert when a transaction contains more than five events.

C.Grouping all events from a single user session across multiple web servers into one transaction.

D.Enriching events with external data from a CSV file based on a common key.

E.Correlating a customer's browsing activity with a subsequent purchase event to calculate conversion rate.

AnswersA, C, E

Transaction can group events by user and then you can search for the pattern.

Why this answer

Option A is correct because the `transaction` command groups related events into a single transaction based on common fields and temporal constraints. In this case, it can group multiple failed login events followed by a successful login for the same user, which is a classic indicator of a brute-force attack. The command allows you to set `maxspan` and `maxpause` to define the time window and gap between events, making it ideal for detecting such sequences.

Exam trap

The trap here is that candidates confuse the `transaction` command with other commands like `stats` or `lookup`, or mistakenly think it can directly trigger alerts, when in fact it only creates transaction objects that can then be used in alerts or further processing.

Full explanation →

11

MCQhard

A security team uses the CIM 'Authentication' data model to investigate failed logins. They have enabled acceleration on the data model and set a summary range of '1d'. After one week, searches against the data model are still slow and use the `search` command instead of `tstats`. What should they check first?

A.Confirm that the data model acceleration is built and that the search time range is within the summary range.

B.Verify that the 'Authentication' data model is assigned to the correct index.

C.Ensure that the 'Authentication' data model has the 'authentication' tag on relevant events.

D.Check that the data model acceleration has completed building for the exact time range of the search.

AnswerA

If the search time range exceeds the summary range, `tstats` cannot be used and Splunk falls back to search.

Why this answer

Option D is correct because acceleration must be built and the search time range must be within the summary range for `tstats` to be used. Option A is not a direct check; data models are not assigned to indices. Option B is important but not the first check if there are events.

Option C is less specific than D.

Full explanation →

12

MCQhard

A Splunk search uses a subsearch to find the top 10 client IPs and then retrieve all events from those IPs. The subsearch is: `index=web sourcetype=access | search [ top clientip | fields clientip ]` What does this search return?

A.The top 10 client IPs in a table.

B.Only the top 10 events based on some field.

C.All events where the client IP appears more than once.

D.All events from the top 10 most common client IPs.

AnswerD

The subsearch finds the top 10 client IPs, then outer search filters events matching those IPs.

Why this answer

The subsearch `[ top clientip | fields clientip ]` returns the top 10 most common client IPs as a list of values. The outer search then uses this list as a filter, effectively running `index=web sourcetype=access clientip=<ip1> OR clientip=<ip2> ...`. This retrieves all events from those IPs, not just the top 10 events.

Option D correctly describes this behavior.

Exam trap

The trap here is that candidates confuse the output of the subsearch (a table of IPs) with the final output of the entire search, failing to recognize that the outer search returns all matching events, not just the top IPs.

How to eliminate wrong answers

Option A is wrong because the outer search does not use `top` or `stats` to produce a table of IPs; it returns raw events from the index. Option B is wrong because the subsearch identifies the top 10 client IPs by count, not the top 10 events by any field; the outer search returns all events matching those IPs, not a limited set of events. Option C is wrong because the subsearch uses `top` to find the most common IPs, not to filter IPs that appear more than once; an IP appearing exactly once could still be in the top 10 if few unique IPs exist.

Full explanation →

13

Multi-Selecteasy

Which TWO statements about the 'transaction' command are true? (Choose two.)

Select 2 answers

A.The 'transaction' command cannot be used with the 'stats' command.

B.The 'transaction' command only works on indexed fields.

C.The 'transaction' command can include events from multiple sourcetypes.

D.The 'transaction' command groups events based on common field values and time proximity.

E.The 'transaction' command requires all events to be from the same host.

AnswersC, D

Correct: Events from different sourcetypes can be grouped.

Why this answer

Options A and B are correct. Transaction groups events by common field values and time proximity, and it can include multiple sourcetypes. Option C is false (transaction does not require same host), Option D is false (it can be used with stats), Option E is false (works on any field).

Full explanation →

14

MCQeasy

A user reports that a macro named `my_macro` is not working in a search. The macro is defined with no arguments and uses a simple search string. What is the most likely issue?

A.The macro permissions are not shared to the user's role.

B.The macro is defined with wrong arguments.

C.The macro name is misspelled in the search.

D.The macro contains a subsearch that fails.

AnswerA

Correct: Macros require proper permissions to be usable by others.

Why this answer

Option A is correct because macros are by default only editable by the creator, and permissions must be set to allow other roles to use them. A misspelling would cause an error message, not silent failure. Wrong arguments would cause an error if used with arguments.

A subsearch failure would also produce an error.

Full explanation →

15

MCQeasy

An analyst wants to see the count of distinct users for each department over the last week. The data contains fields: user, department, date. Which search is correct?

A.... | stats distinct_count(user) by department

B.... | stats dc(user) by department

C.... | eval distinct_count=dc(user) | stats sum(distinct_count) by department

D.... | stats count(user) by department

AnswerB

dc() calculates distinct count.

Why this answer

Option B is correct because the `dc()` function in Splunk's `stats` command calculates the distinct count of values in a field, which is exactly what the analyst needs: the count of distinct users per department over the last week. The `by department` clause groups the results by department, and the implicit time range (last week) is applied via the search time picker or an explicit time filter in the query.

Exam trap

The trap here is that candidates often confuse `count()` (total events) with `dc()` (distinct count) or try to use `distinct_count` as a function name, which is a common Splunk syntax mistake tested in the SPLK-1003 exam.

How to eliminate wrong answers

Option A is wrong because `distinct_count(user)` is not a valid Splunk function; the correct function is `dc(user)` for distinct count. Option C is wrong because `eval distinct_count=dc(user)` is invalid — `dc()` cannot be used in an `eval` command; it is a statistical function only available in `stats` or similar transforming commands, and the subsequent `stats sum(distinct_count)` would not produce the correct distinct count per department. Option D is wrong because `count(user)` counts the total number of events where the user field exists, not the number of distinct users, which does not meet the requirement for distinct users per department.

Full explanation →

16

MCQhard

A dashboard uses a drop-down input to select a server. The drop-down is populated by a search that returns server names. Which setting ensures that the drop-down updates automatically when the underlying data changes?

A.Set the 'delay' option

B.Change the 'refresh' setting on the input

C.Use a token filter

D.Enable 'search on change'

AnswerB

Refresh re-executes the search at a specified interval, updating the drop-down.

Why this answer

The 'refresh' setting on the input causes the search that populates the drop-down to re-run at intervals, keeping the list current. Token filters, delay, or 'search on change' do not provide automatic periodic updates.

Full explanation →

17

MCQmedium

An analyst runs the following search to correlate login and logout events: `index=auth | transaction user startswith="LOGIN" endswith="LOGOUT"`. However, some transactions span over 24 hours. Which option should be added to limit each transaction to a maximum of 8 hours?

A.maxevents=10

B.duration=8h

C.maxpause=8h

D.maxspan=8h

AnswerD

maxspan restricts the transaction to an 8-hour window.

Why this answer

Option D is correct because maxspan=8h limits the total time window of the transaction to 8 hours. Option A (maxpause) limits inactivity, not total duration. Option B (maxevents) limits event count.

Option C (duration) is not a valid transaction option.

Full explanation →

18

MCQeasy

A security analyst wants to visualize the count of login failures by source IP over the last 24 hours, but only for IPs with more than 10 failures. Which visualization type and SPL command combination is most appropriate?

A.Line chart with | top limit=10 showcount=1 by src_ip

B.Column chart with | stats count by src_ip | where count > 10

C.Scatter plot with | stats dc(src_ip) by failure

D.Pie chart with | chart count over src_ip | where count > 10

AnswerB

Correctly uses stats to count, filters, and column chart for comparison.

Why this answer

Option B is correct because it uses `stats count by src_ip` to aggregate login failures per source IP, then `where count > 10` to filter only IPs exceeding 10 failures, and a column chart is ideal for comparing discrete counts across categories (IP addresses). This combination directly answers the requirement: visualize count of failures by IP, with a threshold filter applied after aggregation.

Exam trap

The trap here is that candidates often confuse `top` with a threshold filter or misuse `chart` with incorrect syntax, thinking it can replace `stats` for aggregation, or they choose a visualization type (like scatter or pie) that is inappropriate for comparing counts across many categories.

How to eliminate wrong answers

Option A is wrong because `| top limit=10 showcount=1 by src_ip` returns the top 10 IPs by count, but it does not allow filtering for IPs with more than 10 failures (it only limits to 10 results, not a threshold). Option C is wrong because a scatter plot is used for two continuous variables, not for comparing counts of a single categorical field, and `stats dc(src_ip) by failure` counts distinct source IPs per failure count, which does not produce the required per-IP failure counts. Option D is wrong because `| chart count over src_ip` is invalid syntax (should be `chart count by src_ip`), and a pie chart is poor for comparing many categories; additionally, `where count > 10` cannot be applied after `chart` without a preceding `stats` or `eventstats`.

Full explanation →

19

Multi-Selectmedium

Which TWO options are valid parameters of the `transaction` command?

Select 2 answers

A.timeformat

B.maxpause

C.sequential

D.keepevicted

E.fieldlist

AnswersB, D

maxpause defines maximum pause between events.

Why this answer

Correct options: B (maxpause) and D (keepevicted). Option A (timeformat) is for time parsing, not transaction. Option C (fieldlist) is not a parameter; fields are given as arguments.

Option E (sequential) is not a parameter.

Full explanation →

20

MCQhard

A Splunk user needs to correlate events from different sourcetypes (web_access, auth_log, app_log) that share a common 'transaction_id' field. Each transaction_id may appear many times across sourcetypes. The user wants to group all events with the same transaction_id into one transaction, without any time constraints. Which transaction command is most appropriate?

A.transaction by transaction_id

B.transaction by sourcetype transaction_id

C.transaction maxspan=1d by transaction_id

D.transaction startswith=* endswith=* by transaction_id

AnswerA

Correctly groups by the common field without time limits.

Why this answer

Option A is correct because the `transaction` command with `by transaction_id` groups all events sharing the same `transaction_id` field value into a single transaction, with no default time constraints. This matches the requirement to correlate events across `web_access`, `auth_log`, and `app_log` sourcetypes without any time window restrictions.

Exam trap

The trap here is that candidates often add unnecessary time constraints (like `maxspan=1d`) or marker arguments (`startswith`/`endswith`) when the requirement explicitly states no time constraints, or they incorrectly include `sourcetype` in the `by` clause, which would split transactions across sourcetypes instead of grouping them.

How to eliminate wrong answers

Option B is wrong because `by sourcetype transaction_id` would group events by unique combinations of `sourcetype` and `transaction_id`, which would split events with the same `transaction_id` across different sourcetypes into separate transactions, defeating the cross-sourcetype correlation requirement. Option C is wrong because `maxspan=1d` imposes a 24-hour time constraint on the transaction, which the user explicitly stated should not be applied. Option D is wrong because `startswith=* endswith=*` defines start and end markers for the transaction, which is unnecessary and could cause unintended grouping behavior when all events should simply be grouped by `transaction_id` without marker logic.

Full explanation →

21

MCQmedium

An administrator configures a saved search that uses a macro to generate a summary index every hour. The macro includes a time range argument with default value `earliest=-1h@h latest=@h`. The saved search does not pass any time range argument, so the default is used. After a few days, the summary index is missing data for the last hour of each day. What is the most likely cause?

A.The saved search schedule is set to run on the hour, but the macro's default time range covers the previous hour, creating an overlap.

B.The macro is defined in a private app, and the saved search runs in a different app, causing the macro default not to be used.

C.The saved search's summary index is configured with a summary range that is too short, causing old data to be aged out.

D.The macro definition used a static time range (e.g., `earliest=08:00:00 latest=09:00:00`) instead of a relative one.

AnswerD

Static time range does not update; after one day it will always refer to the same old hour.

Why this answer

Option B is correct because the default time range `-1h@h` to `@h` is relative to search time, which works correctly. But if the saved search schedule is set to run exactly on the hour, data indexed just after the hour will be included in the next hour's summary, so no gap. However, the issue is that the macro uses `now` in the default, which is evaluated at search time.

But if the saved search runs late due to load, the time range shifts. The real cause is that the saved search schedule is set to run on the hour, but the summary index time range should cover the previous full hour. Actually correct answer: The saved search schedule is set to run at every hour but the macro's default time range covers the last complete hour, which is correct.

The problem is more likely that the saved search is not using a sufficient summary range or the macro argument for time range is not being passed. Option B suggests that the macro's default time range is static because it was defined with a specific date/time. That is the most likely cause: the macro definition used a static time rather than relative.

Full explanation →

22

MCQhard

A financial services company uses Splunk to correlate events from multiple applications. Analysts often use `transaction user_id` to group events, but they notice that this command significantly increases search time and memory usage. After investigating, they find that certain 'user_id' values are extremely frequent (e.g., service accounts) causing huge transactions with thousands of events, which exhaust search memory. The team needs to continue grouping by user_id but must avoid performance issues. They also need to preserve the ability to compute statistics like transaction duration. Which approach best addresses both concerns?

A.Set `maxpause=1m` to break large transactions by gaps

B.Use `transaction user_id maxspan=5m maxevents=100`

C.Exclude service accounts using `where user_id!="svc*"` before transaction

D.Switch to `stats values(_raw) by user_id` to avoid transaction overhead

AnswerB

Limits both total time and event count, preventing memory overload.

Why this answer

Using `transaction user_id maxevents=100 maxspan=5m` limits the size of each transaction, preventing the large transactions. Alternatively, pre-filtering to remove noisy accounts can help, but limiting maxevents is more robust. Option C is the most direct fix.

Full explanation →

23

MCQmedium

A Splunk administrator is troubleshooting a slow search that uses the transaction command. The search correlates events by 'user_uuid' with a maxspan of 1 hour. The administrator suspects that many orphan events (events that never complete a transaction) are causing performance issues. Which approach can help identify and possibly exclude orphan events from the transaction?

A.Increase maxspan to allow more events to complete.

B.Use the 'mvlist' option to list all user_uuid values.

C.Use the 'keepevicted=true' option and then filter out evicted events in a subsequent search.

D.Add 'closed_txn=1' to the transaction command to only output complete transactions.

AnswerC

keepevicted=true preserves events that were not included in any transaction, allowing you to analyze or exclude them.

Why this answer

Option C is correct because the `keepevicted=true` parameter causes the `transaction` command to output events that were evicted from the transaction window (orphans) with an `evicted` field set to 1. You can then filter out these evicted events in a subsequent search using `where evicted=0`, which isolates only complete transactions and removes the performance overhead of orphan events.

Exam trap

The trap here is that candidates confuse `keepevicted` with a way to keep orphan events in the output, when in fact it marks them with an `evicted` field so you can explicitly filter them out, and they may also incorrectly assume `closed_txn` is a valid parameter without knowing its exact syntax (`closed_txn=t`).

How to eliminate wrong answers

Option A is wrong because increasing `maxspan` would actually allow more events to be considered for a transaction, potentially increasing the number of orphan events and worsening performance, not solving the issue. Option B is wrong because `mvlist` is not a valid option for the `transaction` command; it is used with `stats` or `eventstats` to list multivalue fields, and it does not help identify or exclude orphan events. Option D is wrong because `closed_txn=1` is not a valid parameter for the `transaction` command; the correct way to output only complete transactions is to use the `closed_txn=t` option, but even then, it does not help identify orphan events for exclusion—it simply suppresses incomplete transactions from output, which may hide the problem but not address the underlying performance impact.

Full explanation →

24

MCQeasy

A user wants to join data from two datasets in a search. Which command is used to combine results based on a common field, but only returns matching results?

A.append

B.union

C.lookup

D.join

AnswerD

Join combines results from two searches on a common field and returns only matching rows.

Why this answer

The join command performs an inner join by default, returning only matches. Append adds rows without matching, union is not a Splunk command, and lookup enriches data.

Full explanation →

25

MCQhard

A company has a Splunk environment with multiple indexers and a search head. They have a large CSV lookup file for user permissions that is used in many dashboards. Recently, users have reported that dashboards are timing out or slow. The lookup file is about 500 MB and is stored in $SPLUNK_HOME/etc/apps/app_name/lookups/. The lookup is defined as an automatic lookup in props.conf for the source type 'user_activity'. The dashboards use the lookup to enrich events and then perform aggregations. The administrator checks the search logs and sees that searches using the lookup are taking a long time, and some are failing with 'Search head timeout'. The lookup file is updated daily by a script that replaces the file. Which course of action would best improve performance without sacrificing data enrichment?

A.Split the lookup into multiple smaller files and use multiple lookups

B.Remove the automatic lookup and use the lookup command only in the dashboards

C.Convert the CSV lookup to a KV Store lookup with the same data

D.Increase the search head timeout setting

AnswerC

KV Store lookups are faster for large datasets and can be used with automatic lookups, improving performance.

Why this answer

Converting the CSV lookup to a KV Store lookup provides better performance for large lookups. KV Store lookups are indexed and more efficient for large datasets, and they support automatic lookups. Simply using the lookup command in dashboards would not address the size issue, splitting into multiple files adds complexity, and increasing timeout only masks the problem.

Full explanation →

26

Drag & Dropmedium

Arrange the steps to create a scheduled report in Splunk in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Scheduled reports require saving a search as a report, then configuring its schedule and time range.

Full explanation →

27

MCQeasy

Which command creates a time-based chart showing a count of events over time?

A.| timecount

B.| timechart count by _time

C.| chart count over _time

D.| timechart count

AnswerB

This explicitly uses _time as the x-axis, creating a time-based chart of counts.

Why this answer

Option B is correct because the `timechart count` command in Splunk automatically bins events into time-based buckets and produces a time-series chart. The `by _time` clause is redundant but not incorrect, as `timechart` inherently uses `_time` as the x-axis; however, the canonical form is `timechart count` without the `by _time` clause. This command aggregates the count of events per time span and displays the result as a column or line chart over time.

Exam trap

Splunk often tests the distinction between `chart` and `timechart`; the trap here is that candidates confuse `chart` with `timechart` and think `chart count over _time` is valid, or they assume `timecount` is a real command, when in fact only `timechart` automatically handles time-based binning.

How to eliminate wrong answers

Option A is wrong because `timecount` is not a valid Splunk command; it is a common misspelling or confusion with `timechart`. Option C is wrong because `chart count over _time` uses incorrect syntax; the `chart` command does not support the `over` keyword for time-based binning and requires a `by` clause for splitting, and it does not automatically create time buckets. Option D is wrong because `timechart count` is actually a valid command that creates a time-based chart, but the question specifically asks for the command that creates a time-based chart showing a count of events over time, and option D is missing the `by _time` clause; however, the correct answer is B because it explicitly includes `by _time`, which is the standard way to ensure the x-axis is time, even though `timechart count` alone would also work.

The trap is that candidates might think D is correct because it is shorter, but the exam expects the explicit `by _time` syntax.

Full explanation →

28

Drag & Dropmedium

Arrange the steps to create a knowledge object of type 'Event Type' in Splunk.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Event types are created by defining a search string that matches events, then saving with a name.

Full explanation →

29

MCQeasy

Which of the following is a recommended practice when creating a lookup table file?

A.Place the file in the global lookups directory.

B.Use mixed case field names.

C.Include a header row with field names that contain no spaces.

D.Use tab-separated values.

AnswerC

Header rows are required, and avoiding spaces prevents errors.

Why this answer

Option D is correct. Using field names with no spaces or special characters avoids parsing issues. Option A is not recommended because mixed case can cause case-sensitivity problems.

Option B, tab-separated, is not standard; CSV with header is preferred. Option C, global lookups directory, is not best practice; it should be in the app's lookups directory.

Full explanation →

30

MCQhard

Refer to the exhibit. What does the pct field represent?

A.The running total percentage of events over time.

B.The percentage of each status across the entire time range.

C.The percentage of each status within each one-hour time bucket.

D.The percentage of events for that status compared to the maximum count in that hour.

AnswerC

Correct: eventstats sums by _time, so pct is per hour per status.

Why this answer

The `pct` field in the context of a time-based chart (e.g., `timechart count by status`) represents the percentage of each status value within each one-hour time bucket. This is calculated by dividing the count of a specific status in that bucket by the total count of all statuses in the same bucket, then multiplying by 100. Option C correctly identifies this per-bucket proportional breakdown.

Exam trap

The trap here is that candidates confuse `pct` (per-bucket percentage) with a global percentage across the entire time range (Option B) or with a running total (Option A), because they overlook that `timechart` inherently groups data into time buckets and calculates percentages within each bucket, not over the whole search span.

How to eliminate wrong answers

Option A is wrong because the `pct` field does not represent a running total percentage over time; that would require a cumulative or moving-window calculation (e.g., `accum` or `streamstats`), not a per-bucket ratio. Option B is wrong because the percentage is not calculated across the entire time range; that would be a single overall percentage per status (e.g., using `top` or `stats` without a time split), not a per-hour breakdown. Option D is wrong because the `pct` field is not relative to the maximum count in that hour; it is relative to the sum of all status counts in that hour, not the peak value.

Full explanation →

31

MCQeasy

A lookup configured with WILDCARD match_type for pattern '10.*.25' is not matching some events. Which of the following event values would NOT be matched by this lookup?

A.10.1.2.25

B.10.1.25.1

C.10.1.25

D.10.2.25

E.10.10.25

AnswerB

Does not match because it does not end with '.25'.

Why this answer

Option E is correct because the pattern '10.*.25' matches any string that starts with '10.' and ends with '.25'. Option E ends with '.1', so it does not match. Options A, B, C, and D all match because they start with '10.' and end with '.25' (D has additional segments but the wildcard matches them).

Full explanation →

32

MCQhard

What is the most likely reason for this behavior?

A.The 'max_time' setting limits the accelerated data to the last 1 day, so tstats only queries that time range.

B.The acceleration summaries are only generated at the 5-minute and 1-hour intervals, not daily.

C.The acceleration is disabled because 'enabled' is set to true incorrectly.

D.The acceleration automatically becomes outdated after 1 day and requires a rebuild.

AnswerA

max_time defines how far back the acceleration data goes, not the final search.

Why this answer

Option C is correct because the acceleration is enabled with 'max_time' set to 1 day, meaning the accelerated summaries only cover the last 24 hours. The 'earliest_time' of -7d sets the range for the initial data model but not the acceleration. Option A is incorrect because the summaries are defined for 5m and 1h.

Option B is incorrect because 'enabled' is true. Option D is incorrect because a 1-day acceleration limit is not 'never outdated'.

Full explanation →

33

MCQhard

What action can the administrator take to resolve this warning?

A.Increase the max_mem_usage_mb setting in limits.conf

B.Split the lookup into multiple smaller files

C.Convert the lookup to a KV Store lookup

D.Increase max_match in the lookup definition

E.Use the lookup command with local=t

AnswerA

This directly raises the memory limit for lookups.

Why this answer

The error indicates the lookup file exceeds the maximum memory allocation. Increasing the max_mem_usage_mb setting in limits.conf allows loading larger files.

Full explanation →

34

Multi-Selectmedium

Which TWO of the following commands can be used to create a table of unique values for a field, along with their counts?

Select 2 answers

A.stats count by field_name

B.fields field_name

C.rare field_name

D.top field_name

E.dedup field_name

AnswersA, D

Returns all unique field values with their counts.

Why this answer

The `stats count by field_name` command groups events by the unique values of the specified field and outputs a table with each value and its count. The `top field_name` command also produces a table of the most frequent field values along with their counts, sorted in descending order by count. Both commands generate the required table of unique values with counts.

Exam trap

Splunk often tests the distinction between commands that produce counts of all unique values (`stats count by`, `top` with `limit=0`) versus commands that only show a subset (`top` default, `rare`) or do not count at all (`fields`, `dedup`).

Full explanation →

35

MCQeasy

A user wants to find the top 5 sourcetypes by event count over the last 24 hours. Which search is correct?

A.index=* | eventcount | top sourcetype

B.index=* | stats count by sourcetype | top 5 sourcetype

C.index=* | stats count by sourcetype | sort -count | head 5

D.index=* | top sourcetype

AnswerC

Correctly counts, sorts descending, and limits to 5.

Why this answer

Option B is correct because it uses stats to count events by sourcetype, sorts descending, and returns the top 5. Option A uses top but defaults to 10 results and does not specify a time range explicitly. Option C has invalid syntax (top 5 sourcetype).

Option D uses eventcount which is not a valid command.

Full explanation →

36

MCQmedium

A user reports that a macro named `my_macro` is not expanding in a search. The macro is defined in a private app called 'App_A'. The user is running the search in a different app called 'App_B'. What is the most likely cause of the issue?

A.The macro has a syntax error that prevents expansion.

B.The user does not have permission to view the macro.

C.The macro name is case-sensitive and the user used wrong case.

D.The macro is not shared to the global context.

AnswerD

Macros are local to the app unless explicitly shared globally.

Why this answer

Macros are confined to the app where they are defined unless shared to the global context. If the macro is not shared, it will not be accessible from other apps. Option B is correct.

Option A (syntax error) would cause a different error. Option C (permissions) is related but the primary issue is app context. Option D (case sensitivity) is not relevant as Splunk macro names are case-insensitive.

Full explanation →

37

MCQhard

A Splunk search uses 'transaction' with a large dataset and causes a 'max transaction' error. What is the most likely cause and best practice to avoid it?

A.The transaction command is used on non-indexed fields; use indexed fields instead.

B.The number of open transactions exceeds the limit; use fields to reduce cardinality or increase maxopentxn.

C.The maxspan value is too low; increase maxspan.

D.The maxevents value is too low; increase maxevents.

AnswerB

Correct: This resolves the max transaction error.

Why this answer

Option B is correct. The error indicates the number of open transactions exceeded the limit (maxopentxn). Reducing field cardinality or increasing maxopentxn helps.

Options A, C, and D address other issues.

Full explanation →

38

Multi-Selecthard

Which THREE of the following are valid ways to extract a substring from a field named "full_name" that contains "Firstname Lastname" into separate fields?

Select 3 answers

A.extract field=full_name first=1 last=2

B.eval first=split(full_name," ")[0], last=split(full_name," ")[1]

C.rex field=full_name "^(?<first>\w+)\s+(?<last>\w+)$"

D.makemv delim=" " full_name | eval first=mvindex(full_name,0), last=mvindex(full_name,1)

E.regex field=full_name "(?<first>[^ ]+) (?<last>[^ ]+)"

AnswersB, C, D

Splits the field into an array and indexes elements.

Why this answer

Option B is correct because the `split` function in Splunk's `eval` command returns a multivalue field from a string based on a delimiter, and array indexing with `[0]` and `[1]` extracts the first and second elements respectively. This directly splits "Firstname Lastname" into two separate fields named `first` and `last`.

Exam trap

Splunk often tests the distinction between commands that extract fields (like `rex` and `eval` with `split`) versus commands that only filter or transform data without creating new fields (like `regex` and `extract` with incorrect syntax).

Full explanation →

39

MCQmedium

A saved search is configured to run every hour and generate a summary index. The original search returns data that is then summarized. Which of the following best describes the purpose of summary indexing?

A.To reduce disk space usage by compressing raw data

B.To create real-time alerts based on historical data

C.To normalize data to the CIM

D.To speed up searches by pre-aggregating data into smaller datasets

AnswerD

Correct: This is the primary purpose.

Why this answer

Option B is correct: Summary indexing precomputes statistics from a search and stores them in a summary index, which can then be searched faster. Option A is not accurate; summary indexing doesn't compress raw data. Option C is about alerts.

Option D is about CIM.

Full explanation →

40

MCQmedium

A user needs a report showing the number of distinct source IPs per sourcetype over the last hour. They run: `index=* earliest=-1h | stats dc(src_ip) by sourcetype`. The search runs slowly (2 minutes) and they want to speed it up. Which optimization is most effective?

A.Use `| top limit=100 sourcetype` to get top sourcetypes.

B.Use `| stats count by sourcetype, src_ip | stats count by sourcetype`.

C.Use `| chart count over sourcetype by src_ip`.

D.Use `| tstats dc(src_ip) where index=* earliest=-1h by sourcetype`.

AnswerD

tstats leverages summary data for faster retrieval.

Why this answer

tstats on an accelerated data model is much faster than scanning raw data. Option B uses nested stats which still scans raw data. Option C does not produce the desired result.

Option D gives top sourcetypes, not distinct IPs.

Full explanation →

41

MCQeasy

What is the most likely cause of this error?

A.The macro does not have read permissions for the administrator's role.

B.The macro is missing the '|' pipe in front of the rest command.

C.The macro definition should use curly braces {} instead of brackets [].

D.The endpoint "/services/authentication/users" is incorrect; it should be "/services/authentication/users". Actually the correct endpoint is '/services/authentication/users' but the admin might have a typo.

AnswerD

The endpoint path is likely misspelled or wrong; typical endpoint is '/services/authentication/users' but contains spaces? Actually the given endpoint seems fine but maybe the leading space? Let's assume the correct endpoint is '/services/authentication/users' and the error indicates not found.

Why this answer

Option B is correct because the '| rest' command requires the endpoint path, but the macro definition includes a space before the pipe, which can be parsed incorrectly. Additionally, the leading pipe inside the macro definition is generally not recommended; the macro should start without the pipe. However, the specific error is about URL not found, which suggests the endpoint path is wrong.

Option A is a possible cause but less direct. Option C is incorrect because braces are not needed. Option D is incorrect because permissions are not the error.

Full explanation →

42

MCQmedium

An analyst wants to correlate events from multiple sourcetypes that have different timestamps but share a common reference ID. The events are ingested with some delay. Which parameter is crucial to ensure the transaction captures all related events despite ingestion delay?

A.maxpause

B.maxevents

C.fields _indextime

D.maxspan

AnswerD

Correct: a large maxspan gives time for delayed events to arrive.

Why this answer

Option D is correct because a large maxspan accommodates delays in event arrival. Option A (maxpause) would not capture events if there is a large gap. Option B (maxevents) does not affect time.

Option C (fields _indextime) is irrelevant.

Full explanation →

43

MCQmedium

A search returns raw events with a field 'response_time'. The analyst wants to calculate the average response time excluding any outliers that are more than 3 standard deviations from the mean. Which SPL approach is most efficient?

A.Use | eventstats avg, stdev(response_time) then | where response_time<=avg+3*stdev and response_time>=avg-3*stdev then | stats avg(response_time)

B.Use | top response_time

C.Use | stats avg(response_time) and then filter with where

D.Use | outlier action=remove

AnswerA

Efficient one-pass calculation with filtering

Why this answer

Option A is correct because it uses `eventstats` to compute the global average and standard deviation of `response_time` across all events, then filters out outliers (values more than 3 standard deviations from the mean) with a `where` clause, and finally calculates the clean average with `stats avg(response_time)`. This approach is efficient because `eventstats` adds the aggregate values to each event without reducing the dataset, allowing a single pass through the data for filtering and aggregation.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats`, where candidates mistakenly use `stats` first and then try to filter, not realizing that `stats` collapses events and loses the ability to apply per-event conditions.

How to eliminate wrong answers

Option B is wrong because `top` returns the most frequent values of a field, not statistical measures like average or standard deviation, and does not address outlier removal. Option C is wrong because using `stats avg(response_time)` first collapses the data into a single value, making it impossible to filter individual events by standard deviation; the `where` clause would have no events to filter. Option D is wrong because `outlier action=remove` is not a valid SPL command; the correct command is `outlier` with `action=remove`, but it uses median and IQR by default, not mean and standard deviation, and may not match the requirement of excluding values more than 3 standard deviations from the mean.

Full explanation →

44

Multi-Selectmedium

Which THREE of the following are components of the Splunk Common Information Model (CIM)? (choose three)

Select 3 answers

A.Application State

B.Endpoint

C.Authentication

D.Change Analysis

E.Network Traffic

AnswersB, C, E

The Endpoint data model is part of CIM.

Why this answer

Option B (Endpoint) is correct because the Splunk Common Information Model (CIM) includes the Endpoint data model, which normalizes data from endpoint security solutions such as antivirus, EDR, and host-based intrusion detection. This data model covers processes, file system changes, registry modifications, and other host-level activities, making it a core component of the CIM.

Exam trap

The trap here is that candidates may confuse 'Change Analysis' with the CIM's 'Change' data model, or assume 'Application State' is a valid CIM component because it sounds like a logical category, but the CIM only includes specific named data models like 'Authentication', 'Endpoint', and 'Network Traffic'.

Full explanation →

45

MCQmedium

A large transaction command is causing the search to run out of memory. Which approach best reduces memory usage while maintaining the transaction logic?

A.Increase the maxeventtokens setting.

B.Use the fields option to include only necessary fields.

C.Replace transaction with stats to aggregate.

D.Use timeline to store transactions.

AnswerB

Limiting fields reduces the data per event, lowering memory consumption.

Why this answer

Option A is correct because using the fields option limits the fields carried in each event of the transaction, reducing memory. Option B (increasing maxeventtokens) would increase memory usage. Option C (using stats) changes the correlation approach.

Option D (timeline) is irrelevant.

Full explanation →

46

Multi-Selectmedium

Which THREE of the following are correct about the transaction command's default behavior?

Select 3 answers

A.Transaction groups events by host, source, and sourcetype by default.

B.Transaction does not require startswith or endswith to be specified.

C.Transaction can evict partial transactions if maxpause is exceeded.

D.Transaction requires all events to come from the same host.

E.Transaction always includes all evicted events in the results.

AnswersA, B, C

Default grouping fields are host, source, and sourcetype.

Why this answer

Options A, C, and D are correct. Transaction by default groups by host, source, and sourcetype; it does not require startswith/endswith; and it can evict partial transactions if maxpause is exceeded. Option B is false because events can span multiple hosts.

Option E is false because keepevicted retains evicted transactions.

Full explanation →

47

MCQmedium

A search uses `transaction sessionId` to correlate events. However, the transaction command is consuming too much memory and the search fails. Which approach can reduce memory usage while still approximating the transaction grouping?

A.Add `maxevents=100` to the transaction

B.Use `dedup sessionId`

C.Use `stats values(_raw) by sessionId`

D.Increase the search job memory limit

AnswerC

stats is lighter and can group events by a common field without the overhead of transaction.

Why this answer

Using `stats values(_raw) by sessionId` aggregates raw events into a multivalue field, which is more memory-efficient than transaction because it does not try to compute duration or keep all event metadata.

Full explanation →

48

MCQhard

An analyst needs to identify events where the field `response_time` is more than 2 standard deviations above the average response_time for the same `host`. Which approach should be used?

A.Use `eventstats avg(response_time) as avg, stdev(response_time) as stdev` then `where response_time > avg+2*stdev`

B.Use `streamstats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`

C.Use `stats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`

D.Use `eventstats avg(response_time) as avg, stdev(response_time) as stdev by host` then `where response_time > avg+2*stdev`

AnswerD

eventstats adds per-host avg and stdev to each event, allowing the comparison.

Why this answer

Option D is correct because `eventstats` with a `by host` clause computes the average and standard deviation of `response_time` for each host across the entire result set, then appends those statistics to every event. This allows the subsequent `where` clause to compare each event's `response_time` against the host-specific threshold `avg+2*stdev`, correctly identifying outliers relative to the same host's distribution.

Exam trap

Splunk often tests the distinction between `eventstats` (which adds aggregate values to each event) and `stats` (which collapses events into a summary), and between `eventstats` and `streamstats` (which computes running vs. global statistics), to see if candidates understand which command preserves raw events for per-event comparisons.

How to eliminate wrong answers

Option A is wrong because `eventstats` without a `by` clause computes global statistics across all hosts, not per host, so the threshold would be based on the overall average and standard deviation, not the host-specific values required. Option B is wrong because `streamstats` computes running (cumulative) statistics over the event stream, not over the entire dataset; this would cause the average and standard deviation to change with each event, producing incorrect thresholds that depend on event order. Option C is wrong because `stats` aggregates the data into a summary table with one row per host, discarding the original events; the `where` clause would then have no individual `response_time` to compare against the threshold.

Full explanation →

49

MCQmedium

A team is using the transaction command to group web server access logs into user sessions. They notice some sessions are missing because the transaction command defaults to combining events with identical field values if they occur within a default time window. What is the default maxspan value for the transaction command?

A.1 minute

B.-1 (no default limit)

C.30 seconds

D.5 minutes

AnswerB

Correct: The default maxspan is -1, meaning no time limit.

Why this answer

Option B is correct. The default maxspan is -1 (unlimited). Options A, C, and D are common misconceptions but incorrect.

Full explanation →

50

Multi-Selectmedium

Which THREE strategies can help reduce memory usage when using the transaction command? (Select exactly 3 correct answers.)

Select 3 answers

A.Filter events before the transaction command.

B.Reduce maxspan and maxpause.

C.Use fields to limit fields before transaction.

D.Use keepevicted=true.

E.Increase maxopentxn.

AnswersA, B, C

Correct: reducing input events lowers memory usage.

Why this answer

Reducing maxspan and maxpause limits the time window, thus fewer open transactions. Filtering events early and using the fields command to limit fields reduce data volume. Increasing maxopentxn and keepevicted=true increase memory usage.

Full explanation →

51

MCQmedium

A security analyst needs to correlate login events with subsequent actions from the same user within 30 minutes. They need to ensure that only one login per user session is considered, and actions after login are attached. Which command is most appropriate?

A.stats values(user) by _time

B.transaction user maxspan=30m

C.append [search action] | sort user _time

D.join user [search login] | timechart

AnswerB

Groups events by user within a time window.

Why this answer

Option A is correct: 'transaction user maxspan=30m' groups all events from the same user within 30 minutes into a single transaction. Option B does not group; C and D are inefficient for this correlation.

Full explanation →

52

MCQeasy

The security operations center (SOC) team at a medium-sized enterprise uses Splunk to investigate potential threats. They maintain a CSV lookup file named 'threat_intel.csv' that contains a list of known malicious IP addresses along with a threat score. The lookup is configured in transforms.conf as: [threat_intel] filename = threat_intel.csv match_type = WILDCARD(ip) They frequently run the following search to enrich firewall events with threat scores: index=firewall sourcetype=firewall_logs | lookup threat_intel src_ip OUTPUT threat_score | where threat_score > 5 Recently, analysts noticed that some IP addresses known to be present in the lookup file are not being matched in search results. They have verified that the lookup file is correctly formatted and contains those IPs, and the transforms.conf has not been altered. They also confirmed that the events contain the field src_ip with the correct IP addresses. Which of the following is the most likely cause of the missing matches?

A.The IP addresses in the lookup file are stored in a different case (e.g., uppercase) than in the events (lowercase).

B.The search is limited to a time range that excludes events with those IP addresses.

C.The lookup file contains duplicate entries for some IPs, causing conflicts.

D.The lookup command requires the input_fields parameter to specify which field to use for matching.

AnswerA

Lookups are case-sensitive by default; mismatched case prevents matching. This is a common issue.

Why this answer

The most likely cause is that the IP addresses in the lookup file and the events have different cases (e.g., uppercase vs. lowercase). Splunk lookups are case-sensitive by default unless the match_type is set to ignore case. Since the match_type is WILDCARD (which is still case-sensitive), case differences will prevent matching.

Option B is correct.

Full explanation →

53

Multi-Selecthard

Which THREE of the following are best practices when using lookups in Splunk?

Select 3 answers

A.Use the lookup command instead of inputlookup when possible to reduce memory usage

B.Use automatic lookups to enrich data at search time without manual commands

C.Store lookup tables in KV Store when the table has more than 1 million rows

D.Always use KV Store lookups for faster performance compared to CSV lookups

E.Keep lookup file sizes under 500 MB to avoid performance degradation

AnswersA, B, E

lookup command streams data efficiently.

Why this answer

Option A is correct because using the `lookup` command with the `local=t` argument (or when the lookup table is small enough to be loaded into memory) can reduce memory usage compared to `inputlookup`, which always loads the entire lookup file into memory. The `lookup` command can stream results and only loads necessary fields, making it more efficient for large datasets. This is a best practice to avoid out-of-memory errors in distributed search environments.

Exam trap

Splunk often tests the misconception that KV Store is always superior to CSV lookups, but the trap is that KV Store has higher latency for static data and is only recommended for dynamic, frequently updated lookups or when the table size exceeds CSV memory limits.

Full explanation →

54

MCQeasy

Which command is used to convert a multi-value field into individual events?

A.mvexpand

B.eval split

C.makemv

D.fields

AnswerA

Correctly expands multi-value fields into separate events.

Why this answer

Option A is correct because mvexpand expands multi-value fields into separate events. Option B (makemv) creates multi-value fields. Option C (eval split) also creates multi-value fields.

Option D (fields) selects fields.

Full explanation →

55

MCQhard

A search returns events with fields 'user', 'duration', and 'status'. The analyst wants to find users whose average duration exceeds 100 and who have more than 5 events. Which search is correct?

A.`... | where avg(duration)>100 | stats count by user | where count>5`

B.`... | top user limit=0 | where avg(duration)>100`

C.`... | stats avg(duration) as avg_dur, count as cnt by user | where avg_dur>100 and cnt>5`

D.`... | eventstats avg(duration) as avg_dur, count as cnt by user | where avg_dur>100 and cnt>5`

E.`... | stats avg(duration) as avg_dur, count as cnt by user | having avg_dur>100 and cnt>5`

AnswerC

Correct: stats reduces to one row per user, then where filters.

Why this answer

Option C is correct because it uses the `stats` command to compute the average duration and count per user in a single pass, then filters with `where` to enforce both conditions: average duration > 100 and event count > 5. This is the standard pattern for per-user aggregation followed by post-aggregation filtering in Splunk.

Exam trap

Splunk often tests the difference between `stats` and `eventstats`, where candidates mistakenly choose `eventstats` thinking it filters users, but it actually keeps all events and applies the `where` condition per event, not per user.

How to eliminate wrong answers

Option A is wrong because `where avg(duration)>100` is applied before any grouping, which attempts to filter on an aggregate without a `by` clause, causing an error or incorrect results. Option B is wrong because `top user limit=0` returns the most frequent users but does not compute average duration or allow filtering on it. Option D is wrong because `eventstats` adds aggregate values to each event without collapsing rows, so the `where` clause would evaluate per event, not per user, and the count would be the total events per user repeated on each row, not a distinct user-level filter.

Option E is wrong because `having` is not a valid Splunk command; it is a SQL clause not supported in SPL.

Full explanation →

56

MCQhard

What is the most likely cause of the error?

A.The macro definition uses positional arguments but the call uses named arguments.

B.The macro name in the definition contains an invalid character (backtick).

C.The macro call is missing a closing parenthesis.

D.The macro call uses named arguments instead of positional arguments.

AnswerB

Backtick is not allowed in macro names; it causes parsing errors.

Why this answer

The error is caused by the backtick character in the macro definition name. In Splunk, macro names must consist only of alphanumeric characters and underscores; backticks are invalid and will cause a parsing error when the macro is defined or called.

Exam trap

Splunk often tests the specific rule that macro names must be alphanumeric with underscores only, and the trap here is that candidates focus on argument syntax (positional vs named) instead of recognizing the invalid character in the name.

How to eliminate wrong answers

Option A is wrong because the error is not about argument mismatch; the backtick in the name prevents the macro from being parsed at all. Option C is wrong because a missing closing parenthesis would produce a syntax error, but the backtick is the more likely cause given the name issue. Option D is wrong because the error is not about named vs positional arguments; the invalid character in the name is the root cause.

Full explanation →

57

MCQeasy

A security analyst wants to create a macro that extracts IP addresses from a field named `src_ip` and returns a count of unique IPs per source. Which macro definition accomplishes this?

A.| stats count(src_ip) as unique_ips

B.| stats distinct_count(src_ip) as unique_ips

C.| stats unique(src_ip) as unique_ips

D.| stats dc(src_ip) as unique_ips

AnswerD

`dc` (distinct count) counts unique values.

Why this answer

Option D is correct because `dc(src_ip)` is the Splunk command for distinct count, which returns the number of unique IP addresses in the `src_ip` field. This macro definition directly fulfills the requirement to count unique IPs per source, as `dc` is the standard abbreviation for distinct count in Splunk's `stats` command.

Exam trap

Splunk often tests the distinction between `count` and `dc` (distinct_count), where candidates mistakenly choose `count` or invalid commands like `distinct_count` or `unique`, not knowing that `dc` is the correct and only valid syntax for distinct count in Splunk's `stats` command.

How to eliminate wrong answers

Option A is wrong because `count(src_ip)` counts all occurrences of `src_ip`, including duplicates, not unique IPs. Option B is wrong because `distinct_count` is not a valid Splunk command; the correct syntax is `dc`. Option C is wrong because `unique` is not a valid aggregation function in Splunk's `stats` command; it would cause a syntax error.

Full explanation →

58

Multi-Selectmedium

A user needs to find events where a user had a failed login followed by a successful login within 10 minutes, and then list the total number of such occurrences per user. Which THREE steps are necessary? (Select three.)

Select 3 answers

A.Use the eval command to set a field for failure status

B.Use the stats command to count by user

C.Use the where command to filter transactions with both failure and success

D.Use the transaction command with maxspan=10m

E.Use the transaction command with startswith and endswith

AnswersB, D, E

Aggregates transaction counts per user.

Why this answer

Options A, B, and C are correct. The transaction command (A) with maxspan=10m groups events, and startswith/endswith (B) define the transaction boundaries. Then stats (C) counts the transactions per user.

Option D is not needed because transaction ensures the pattern. Option E is not necessary as fields exist.

Full explanation →

59

Multi-Selectmedium

Which TWO configurations are required to create a geospatial visualization of server locations?

Select 2 answers

A.A sourcetype that includes country codes.

B.An index that contains geographic data.

C.A mappings.json file in the app directory.

D.A lookup table containing latitude and longitude fields.

E.The use of the geostats command.

AnswersC, D

mappings.json defines geographic shapes for choropleth or region maps.

Why this answer

Options A and D are correct. A lookup table with latitude and longitude fields provides the location data, and a mappings.json file defines the geographic shapes for the visualization. Options B, C, and E are not strictly required; they are optional or used for other purposes.

Full explanation →

60

MCQhard

A team is designing a dashboard to monitor real-time server CPU utilization. They want to update every 10 seconds and use a gauge visualization. What is the best search mode to use for real-time performance?

A.Smart mode

B.Real-time mode

C.Fast mode

D.Verbose mode

AnswerC

Fast mode minimizes field extractions, improving real-time search performance.

Why this answer

Fast mode reduces computational overhead by limiting field extraction, which is ideal for real-time dashboards. Smart mode may do more work, Verbose mode returns all fields, and Real-time is not a search mode.

Full explanation →

61

Multi-Selecthard

Which THREE of the following are valid ways to count the number of events per minute for a given sourcetype?

Select 3 answers

A.index=main sourcetype=web | stats count by date_minute

B.index=main sourcetype=web | streamstats count window=1m | where count>0

C.index=main sourcetype=web | eval minute = strftime(_time, "%Y-%m-%d %H:%M") | stats count by minute

D.index=main sourcetype=web | bucket _time span=1m | stats count by _time

E.index=main sourcetype=web | timechart count span=1m

AnswersC, D, E

Creates a unique minute string and groups by it.

Why this answer

Options A, B, and D are correct. A uses bucket to group by minute then stats count. B uses timechart with span=1m.

D creates a minute-level string and groups by it. C uses date_minute which only captures the minute portion, not the full timestamp. E uses streamstats for a running count, not a per-minute count.

Full explanation →

62

MCQmedium

Refer to the exhibit. An analyst executes the following search: `| filter_status(status_code=500)`. What will be the result?

A.The macro will run successfully, but it will use the literal string "status_code=500" as the argument value.

B.The macro will fail because the argument definition uses $arg1$ but the macro was called with "status_code=500".

C.The macro will run successfully, returning count of events with status=500.

D.The macro will fail because named arguments are not supported; Splunk macro arguments are positional.

AnswerD

Correct. Splunk macros only accept arguments positionally; named arguments cause an error.

Why this answer

Option D is correct because Splunk macros do not support named arguments; they use positional arguments defined by $arg1$, $arg2$, etc. In the search `| filter_status(status_code=500)`, the argument is passed as a named key-value pair, but the macro definition expects a positional argument. This mismatch causes the macro to fail, as Splunk cannot resolve the named argument to the positional placeholder.

Exam trap

Splunk often tests the distinction between positional and named arguments in Splunk macros, trapping candidates who assume macros support named parameters like commands or who think the macro will simply treat the input as a literal string.

How to eliminate wrong answers

Option A is wrong because the macro will not run successfully; Splunk will not treat 'status_code=500' as a literal string but will attempt to match it to a positional argument, leading to failure. Option B is wrong because the failure is not due to the argument definition using $arg1$ while the call uses 'status_code=500' — that is the surface symptom, but the root cause is that Splunk macros require positional arguments, not named ones. Option C is wrong because the macro will not return a count of events with status=500; it will fail to execute due to the named argument syntax.

Full explanation →

63

MCQeasy

An analyst runs `transaction user_id` to correlate events from a web server. The resulting transaction events have a field 'duration' that shows the time between the first and last event. However, some transactions span over 30 minutes. What transaction option should be added to limit the maximum time between the first and last event?

A.maxspan=30m

B.maxpause=30m

C.maxevents=30

D.contime=30m

AnswerA

Correctly limits the transaction span to 30 minutes.

Why this answer

The maxspan option sets the maximum time span from the first event to the last event in a transaction.

Full explanation →

64

MCQmedium

Refer to the exhibit. A security analyst runs this search to group SSH login events into sessions based on a session_id that is extracted only from 'Accepted publickey' events. However, the resulting transactions contain only the 'Accepted publickey' event and none of the subsequent commands or logouts. What is the most likely cause?

A.The maxpause=5m is too short, causing the transaction to close before other events occur.

B.The session_id field is only populated for the 'Accepted publickey' event, so other events have a different or null session_id and do not join the transaction.

C.The transaction command requires that all events have a non-null session_id to be grouped.

D.The sourcetype filter is too restrictive.

AnswerB

Only the start event gets a session_id; other events have null, so they are not grouped.

Why this answer

Option B is correct because the `transaction` command groups events by the `session_id` field. If `session_id` is only extracted from 'Accepted publickey' events (e.g., via a `rex` or `eval` command), subsequent commands and logout events will have a null or different `session_id`. Since `transaction` requires all events in the group to share the same `session_id` value, those other events cannot join the transaction, resulting in a transaction containing only the single 'Accepted publickey' event.

Exam trap

The trap here is that candidates often assume `maxpause` or timing is the culprit, but the real issue is that the `transaction` command requires all events in the group to share the same value for the specified field(s), and if the field is missing or null on other events, they cannot be correlated.

How to eliminate wrong answers

Option A is wrong because `maxpause=5m` defines the maximum time between events in the same transaction; if other events occur within 5 minutes, they would still be included if they shared the same `session_id`. The issue is not timing but field availability. Option C is wrong because the `transaction` command does not require all events to have a non-null `session_id`; it groups events by the specified field(s), and events with a null `session_id` simply will not match the non-null value of the 'Accepted publickey' event.

Option D is wrong because the sourcetype filter is not mentioned in the exhibit or question as being overly restrictive; the problem is specifically about the `session_id` field not being populated on other events, not about sourcetype filtering.

Full explanation →

65

MCQmedium

A network operations team monitors firewall logs using Splunk. They need to group events from the same TCP session, identified by 'src_ip', 'dst_ip', and 'src_port'. The logs contain events for 'session_start', 'data_transfer', and 'session_end' actions. They currently use `transaction src_ip dst_ip src_port startswith=action=session_start endswith=action=session_end`. However, many transactions are incomplete because some sessions do not have a 'session_end' event due to firewall timeouts. The team wants to include these incomplete sessions as well, but still group them around a start event. What should they modify?

A.Add `maxspan=30m` and keep endswith

B.Remove endswith and add maxspan=30m

C.Change startswith to `action=session_start OR action=session_end`

D.Use `transaction src_ip dst_ip src_port maxspan=30m` without startswith or endswith

AnswerB

Startswith defines start; maxspan closes the transaction automatically after 30 minutes if no end.

Why this answer

To include incomplete sessions, use `transaction src_ip dst_ip src_port startswith=action=session_start maxspan=30m` without endswith. This will create a transaction starting with session_start and ending after maxspan or if another start event is encountered.

Full explanation →

66

MCQhard

A search uses 'transaction' to group events by session, but the results show too many transactions with only one event. What is the best way to filter out single-event transactions?

A.| transaction ... | where eventcount > 1

B.Add maxspan=5m to the transaction command

C.| transaction maxevents=2 ...

D.| transaction ... | where eventcount=2

AnswerA

eventcount is a default field added by transaction; filtering >1 removes single-event transactions.

Why this answer

Option A is correct because the `transaction` command groups events into transactions, and appending `| where eventcount > 1` filters out any transaction that consists of only a single event. This directly addresses the requirement to remove single-event transactions, as `eventcount` is a default field added by `transaction` that counts the number of events in each transaction.

Exam trap

Splunk often tests the distinction between filtering after `transaction` versus using parameters like `maxspan` or `maxevents`, where candidates mistakenly think time or count limits inherently exclude single-event transactions, but those parameters only constrain grouping, not post-group filtering.

How to eliminate wrong answers

Option B is wrong because `maxspan=5m` limits the maximum time span of a transaction but does not filter out single-event transactions; a single event can still occur within a 5-minute window. Option C is wrong because `maxevents=2` caps the maximum number of events in a transaction at 2, but it does not exclude transactions with exactly 1 event; it only prevents more than 2 events. Option D is wrong because `where eventcount=2` would keep only transactions with exactly 2 events, not all transactions with more than 1 event, thus incorrectly discarding transactions with 3 or more events.

Full explanation →

67

Multi-Selecteasy

Which THREE statements about the `transaction` command are true?

Select 3 answers

A.It can correlate events from different sourcetypes

B.The maxevents option limits the number of unique field values per transaction

C.It sorts events within each transaction by _time

D.It can correlate events across multiple indexes

E.Transaction always produces summary indexing output

AnswersA, C, D

Transaction groups events based on shared field values regardless of sourcetype.

Why this answer

Correct: A (can use fields from different sourcetypes), B (automatically sorts events by _time), E (can correlate events across multiple indexes). Option C is false because maxevents limits event count, not field values. Option D is false because transaction does not produce summary indexing output by default.

Full explanation →

68

MCQeasy

Which command adds the overall average of a field to each event in the results?

A.streamstats avg(latency) as avg_latency

B.timechart avg(latency) as avg_latency

C.stats avg(latency) as avg_latency

D.eventstats avg(latency) as avg_latency

AnswerD

`eventstats` adds the average as a new field to each event.

Why this answer

The `eventstats` command computes aggregate statistics (like `avg(latency)`) over the entire result set and adds the result as a new field to every event, preserving all original events. This matches the requirement to add the overall average to each event. In contrast, `stats` collapses events into a single summary row, `streamstats` computes a running average per event, and `timechart` produces a time-based chart, none of which add the overall average to every original event.

Exam trap

Splunk often tests the distinction between `eventstats` and `stats` — the trap here is that candidates confuse `stats` (which collapses events) with `eventstats` (which adds the aggregate to each event), leading them to incorrectly choose `stats` because they think it computes the average without realizing it removes the original events.

How to eliminate wrong answers

Option A is wrong because `streamstats avg(latency) as avg_latency` computes a running (cumulative) average over the events in order, not the overall average of the entire field, and it adds a per-event running value, not the single global average. Option B is wrong because `timechart avg(latency) as avg_latency` groups events by time buckets and returns a time-series chart with one average per bucket, discarding the original events and not adding a field to each event. Option C is wrong because `stats avg(latency) as avg_latency` aggregates all events into a single summary row containing only the average value, removing all original events and fields.

Full explanation →

69

MCQhard

An administrator defines a macro that calls another macro. Both macros are defined in the same app. The first macro works correctly, but when executed, it triggers an error: 'Recursive macro call detected'. What is the most likely cause?

A.The second macro is not shared to the global context.

B.The second macro calls the first macro, creating a circular reference.

C.The first macro has a syntax error that only appears when combined.

D.The first macro passes incorrect arguments to the second macro.

AnswerB

Splunk macros cannot be recursive; circular references cause this error.

Why this answer

Splunk detects and prevents recursive macro calls (a macro that directly or indirectly calls itself). The error indicates that the two macros form a circular reference. Option B is correct.

Option A (argument mismatch) would give a different error. Option C (permissions) is not relevant. Option D (syntax error) would also give a different error.

Full explanation →

70

Multi-Selecthard

Which THREE of the following are features of the `timechart` command?

Select 3 answers

A.It automatically creates a time-based chart with a default span.

B.It can be used with the `by` clause to split into multiple series.

C.It can aggregate data using functions like count, sum, avg.

D.It can output results to a lookup file.

E.It requires the `span` option to be specified.

AnswersA, B, C

timechart bins events over time and produces a timechart visualization.

Why this answer

A, B, and C are correct. timechart supports aggregation functions (A), automatically creates a time-based chart with default span (B), and can split into multiple series with the `by` clause (C). D is incorrect because `span` is optional. E is incorrect because timechart does not output to a lookup file.

Full explanation →

71

Multi-Selecthard

Which THREE of the following are correct characteristics of the transaction command? (Choose three.)

Select 3 answers

A.It groups related events based on common field values.

B.It can group events from different indexes.

C.It can use maxspan to set the maximum total duration of a transaction.

D.It can use maxpause to set the maximum time between events in a transaction.

E.By default, it retains all original fields from all events in the transaction.

AnswersA, C, D

Transaction groups events that share a common field, like session ID.

Why this answer

Option A is correct because the transaction command groups related events that share common field values, such as a session ID or user ID, to form a single transaction. This is a core function of the command, allowing you to correlate events across a dataset based on matching field content.

Exam trap

The trap here is that candidates often assume the transaction command can merge events across indexes or that it preserves all fields by default, but Splunk's transaction command is index-scoped and field-retention is minimal without explicit configuration.

Full explanation →

72

Multi-Selecthard

Which THREE of the following commands can produce a time-based chart (timechart or chart with time buckets)? (Choose three.)

Select 3 answers

A.`chart count over _time bins=24`

B.`stats count by _time span=1h`

C.`timechart span=1h count`

D.`top _time`

E.`chart count by _time span=1d`

AnswersA, C, E

Correct: chart with bins over _time creates a time-based chart.

Why this answer

Option A is correct because the `chart` command with `over _time bins=24` explicitly creates a time-based chart by splitting the time range into 24 equal bins, each representing a time bucket, and then counts events per bucket. This produces a chart that can be visualized over time, similar to a timechart.

Exam trap

Splunk often tests the distinction between `stats` and `chart`/`timechart`, where candidates mistakenly think `stats count by _time` can produce a time-based chart, but `stats` only returns tabular data and does not support time-based charting without the `chart` or `timechart` command.

Full explanation →

73

MCQhard

Refer to the exhibit. What is the most likely cause of this error?

A.The lookup definition [geo_lookup] does not exist in transforms.conf

B.The lookup file geo.csv is not present in the lookups directory

C.The field src_ip is misspelled in the lookup table

D.The user does not have read permission on the lookup table

AnswerA

The 'inputlookup' command requires a valid lookup definition; absence causes this exact error.

Why this answer

Option D is correct because the error 'Could not find lookup table' indicates the lookup definition (transforms.conf stanza) is missing or incorrect. Option A is incorrect because if the file were missing but definition existed, the error would mention 'file not found'. Option B is incorrect because permissions errors produce a different message.

Option C is incorrect because the error is about the table, not the field.

Full explanation →

74

Multi-Selectmedium

A Splunk administrator is creating a dashboard to visualize real-time network traffic data. The dashboard must include a lookup to enrich source IPs with location data. The lookup file contains 500,000 entries and is updated hourly. Which TWO optimization techniques should the administrator apply to ensure dashboard performance?

Select 2 answers

A.Use a KV store lookup instead of a CSV lookup.

B.Use the append command instead of lookup to add fields.

C.Set the lookup to batch_index_query=true in transforms.conf.

D.Use a subsearch to filter the lookup file before joining.

E.Set the lookup to use max_matches=1 to prevent multiple matches.

AnswersA, E

KV store supports indexing and concurrent access, improving performance for large lookups.

Why this answer

Option A is correct because KV Store lookups are optimized for large, frequently updated datasets (like 500,000 entries updated hourly) by storing data in-memory and supporting indexed lookups, which significantly reduces search-time overhead compared to CSV lookups that require file parsing and scanning. This makes KV Store ideal for real-time dashboards where low latency is critical.

Exam trap

Splunk often tests the misconception that `batch_index_query=true` applies to all lookup types, but it is specific to index lookups and not CSV or KV Store lookups, leading candidates to incorrectly select it for file-based lookups.

Full explanation →

75

MCQeasy

An analyst wants to create a time series chart showing the count of errors per hour over the last 24 hours. The errors are logged with sourcetype=error_log. Which search achieves this?

A.index=main sourcetype=error_log | chart count over _time by hour

B.index=main sourcetype=error_log | bin _time span=1h | stats count by _time

C.index=main sourcetype=error_log | chart count by _time

D.index=main sourcetype=error_log | timechart count span=1h

AnswerD

Correctly produces hourly count time chart.

Why this answer

Option D is correct because `timechart count span=1h` automatically creates a time series chart with one-hour buckets over the last 24 hours, grouping events by `_time` and counting them per bucket. The `timechart` command is specifically designed for time-based aggregation and produces a chart with `_time` on the x-axis, which is exactly what the analyst needs.

Exam trap

The trap here is that candidates often confuse `chart` with `timechart`, thinking `chart count by _time` will produce a time series, but `chart` treats `_time` as a categorical field rather than a continuous time axis, leading to incorrect visualizations.

How to eliminate wrong answers

Option A is wrong because `chart count over _time by hour` is invalid syntax; `chart` does not support `over` and `by` in that order, and it would not bin events into hourly buckets. Option B is wrong because `bin _time span=1h | stats count by _time` produces a table, not a time series chart, and the `bin` command modifies `_time` but `stats` does not automatically generate a chart. Option C is wrong because `chart count by _time` creates a chart with each unique `_time` value as a separate column, not a time series with hourly aggregation, and it would produce too many data points for a 24-hour period.

Full explanation →

Page 1 of 7

All pages

Practice SPLK-1003 by domain

Target a specific domain to shore up weak areas.

Advanced Searching and Statistics Macros, Saved Searches and CIM Advanced Visualization and Lookups Transactions and Event Correlation

See all domains with question counts →