A Splunk environment ingests 10 TB per day. A user runs a search to count events per sourcetype over the last 7 days: `index=* earliest=-7d | timechart count by sourcetype`. The search returns partial results and eventually times out. The user needs to obtain the complete results efficiently. What is the best course of action?
tstats leverages acceleration and is faster for large data volumes.
Why this answer
Option C is correct because `tstats` runs on indexed metadata (tsidx files) rather than raw events, making it far more efficient for counting events over large time ranges. By specifying `by _time span=1d, sourcetype`, you get daily counts per sourcetype without scanning the entire event data, avoiding the timeout that occurs with a raw search over 10 TB/day for 7 days.
Exam trap
Splunk often tests the distinction between raw event searches and metadata-based searches, and the trap here is that candidates may not realize `tstats` can aggregate by sourcetype and time span without touching raw data, leading them to choose inefficient raw-search options like A or D.
How to eliminate wrong answers
Option A is wrong because `bucket span=1d | stats count by _time sourcetype` still requires scanning all raw events from the index, which is inefficient and will likely time out on 70 TB of data. Option B is wrong because `sitime` is not a valid Splunk command; it appears to be a distractor, and sampling would not provide complete results as required. Option D is wrong because breaking the search into 1-day intervals and using `append` still requires scanning raw events for each interval, leading to the same performance issues and potential timeout, plus it adds overhead from multiple searches.