This chapter covers Open Source Intelligence (OSINT) sources for threat intelligence, a critical skill for security analysts collecting information from publicly available sources. On the CS0-003 exam, OSINT sources appear in roughly 5-10% of questions, typically in the context of threat intelligence gathering, indicator enrichment, and reconnaissance techniques. Mastering OSINT sources helps you identify, validate, and prioritize threats without costly subscriptions.
Jump to a section
Think of OSINT as a private investigator (PI) assembling a dossier on a suspect. The PI starts with the suspect's name (the initial indicator, like a domain or IP). They then visit public records (government databases, business registrations) to find addresses and associates—this is like querying WHOIS or DNS records. Next, they search newspapers and social media for recent activities—similar to scanning social media platforms and forums. The PI also checks the suspect's trash for discarded documents—analogous to searching paste sites for leaked credentials. Each piece of information is public, but the PI combines them to build a picture the suspect never intended to reveal. Just as a PI must verify sources and cross-reference, an analyst must corroborate OSINT data. The PI's final report is the threat intelligence product—actionable insights derived from open sources. Critically, the PI never breaks the law; they only use what's publicly available. Similarly, OSINT must be collected ethically and legally. The PI's skill lies in knowing where to look and how to connect dots—exactly the skill set of a threat intelligence analyst using OSINT.
What is OSINT and Why Does It Exist?
Open Source Intelligence (OSINT) refers to intelligence gathered from publicly available sources. In cybersecurity, OSINT is used to collect information about potential threats, attackers, and vulnerabilities. It exists because adversaries leave digital footprints—domain registrations, social media posts, public code repositories, and more. Analysts leverage OSINT to understand an attacker's infrastructure, tactics, and targets without engaging directly.
Categories of OSINT Sources
OSINT sources are typically categorized into six groups: - Search Engines: Google, Bing, DuckDuckGo—used for general reconnaissance. - Social Media: Twitter, LinkedIn, Facebook—for profiling individuals or groups. - Government/Public Records: WHOIS, SEC filings, business registries—for domain ownership and corporate relationships. - Technical Sources: DNS records, SSL certificates, Shodan—for infrastructure mapping. - Paste Sites & Dark Web: Pastebin, IntelX, Ahmia—for leaked credentials or early breach data. - Threat Intelligence Platforms: AlienVault OTX, MISP, VirusTotal—for aggregated indicators.
How OSINT Works Internally: The Collection Process
The OSINT process follows a cycle: planning, collection, processing, analysis, and dissemination. At the technical level, collection often involves APIs, web scraping, or manual queries.
WHOIS Lookups: When you query WHOIS for a domain, the registrar's WHOIS server returns registration details (owner, creation date, nameservers). Many registrars now redact personal data due to GDPR, but organization fields may still be visible.
DNS Enumeration: Tools like dig or nslookup query DNS servers for A, AAAA, MX, TXT, and CNAME records. For example, dig example.com ANY returns all records. TXT records often contain SPF, DKIM, or verification strings that reveal email infrastructure.
SSL Certificate Transparency (CT) Logs: Every SSL certificate issued is logged in public CT logs. You can search by domain to find certificates (including those for subdomains). Tools like crt.sh provide a web interface to query CT logs. For example, https://crt.sh/?q=%25.example.com lists all certificates for *.example.com.
Shodan: Shodan indexes banners from internet-connected devices. A query like port:22 country:US returns SSH servers in the US. This reveals exposed services that adversaries might target.
Social Media Scraping: APIs from Twitter or LinkedIn can be used to collect posts, followers, and connections. However, terms of service must be respected; unauthorized scraping may violate laws.
Key Components, Values, and Defaults
WHOIS Query Tools: whois command-line tool (Linux) or web interfaces. Default timeout is 30 seconds. Common fields: Registrant Organization, Registrant Country, Creation Date, Expiration Date, Name Servers.
DNS Record Types: A (IPv4), AAAA (IPv6), MX (mail exchange), TXT (text), CNAME (alias), NS (nameserver). TTL values are in seconds; default TTL for most records is 3600 (1 hour).
SSL Certificate Fields: Subject Common Name (CN), Subject Alternative Names (SANs), Issuer, Validity period (not before/not after), SHA-256 fingerprint.
Shodan Filters: port, country, org, hostname, ssl (e.g., ssl.cert.subject.cn:example.com).
VirusTotal: Provides file hash lookup, URL scan, domain report. API key required for automated queries. Rate limit is 4 requests per minute for free tier.
Configuration and Verification Commands
WHOIS lookup:
whois example.comDNS enumeration with dig:
dig example.com ANY @8.8.8.8Query CT logs via crt.sh (using curl):
curl -s 'https://crt.sh/?q=example.com&output=json' | jq .Shodan search (requires API key):
shodan search 'port:22 country:US'VirusTotal domain report (API key required):
curl -s --request GET --url 'https://www.virustotal.com/api/v3/domains/example.com' --header 'x-apikey: YOUR_API_KEY'Interaction with Related Technologies
OSINT often feeds into Threat Intelligence Platforms (TIPs). For example, an indicator from OSINT (like a malicious IP) can be ingested into MISP (Malware Information Sharing Platform) via API. The TIP then correlates it with internal logs. OSINT also integrates with Security Information and Event Management (SIEM) systems: analysts search OSINT sources for context on alerts. For instance, a suspicious IP appears in a firewall log; an OSINT lookup reveals it's associated with a known C2 server.
Automation and Tools
theHarvester: Automates email, subdomain, and IP enumeration from search engines, PGP key servers, and Shodan. Example: theHarvester -d example.com -b google.
Recon-ng: Modular reconnaissance framework with modules for various OSINT sources. Use marketplace install to add modules.
Maltego: Graphical link analysis tool with transforms for OSINT. It visualizes relationships between domains, IPs, and people.
SpiderFoot: Automated OSINT scanner with over 200 modules. Can be run as a web service or CLI. Example: python3 sf.py -s example.com -t ALL.
Ethical and Legal Considerations
OSINT must be collected ethically. Unauthorized scraping that violates terms of service or access restrictions (e.g., bypassing login walls) may be illegal. In many jurisdictions, even public data collection can be regulated (e.g., GDPR). Always review platform policies. For the exam, know that OSINT is legal as long as it does not involve unauthorized access or circumvention of controls.
Define Intelligence Requirements
Before collecting OSINT, clearly define what you need. Are you investigating a phishing domain? Profiling a threat actor? Mapping an organization's attack surface? Requirements drive source selection. For example, if you need to identify all subdomains of a target, you'll query CT logs, DNS zone transfers (if possible), and search engines. If you're after leaked credentials, you'll check paste sites and dark web forums. Document requirements to avoid scope creep and ensure focused collection.
Select and Query OSINT Sources
Based on requirements, choose appropriate sources. For domain intelligence, query WHOIS, DNS, CT logs, and Shodan. For person intelligence, search social media, professional networks, and public records. Use automated tools like theHarvester or Recon-ng to query multiple sources simultaneously. For example, to enumerate emails: `theHarvester -d example.com -b google,linkedin`. Each query returns raw data (e.g., WHOIS output, DNS records). Record the source and timestamp for provenance.
Process and Normalize Collected Data
Raw OSINT data is often unstructured. Parse WHOIS output to extract fields like registrant org, creation date. Convert DNS records into structured format (domain, record type, value). Deduplicate entries—multiple sources may return the same IP. For example, if both Shodan and VirusTotal list the same IP, merge the data. Use tools like `jq` for JSON parsing or Python scripts. Normalize timestamps to UTC. This step prepares data for analysis.
Correlate and Analyze for Patterns
Cross-reference indicators across sources to build a coherent picture. For instance, a domain's WHOIS registrant email may appear on a paste site with a password dump, linking the domain to a known breach. Use link analysis tools like Maltego to visualize connections. Look for patterns: same IP hosting multiple malicious domains, shared SSL certificates, or common email addresses. This step transforms raw data into intelligence.
Produce and Disseminate Intelligence
Document findings in a structured report. Include indicators (IPs, domains, hashes), confidence levels (based on source reliability and corroboration), and recommended actions. For example, 'IP 192.0.2.1 is a likely C2 server (confidence: high) due to association with three known malware samples. Block at firewall.' Share with relevant teams via TIP or SIEM. Update intelligence regularly as OSINT sources change.
Enterprise Scenario 1: Phishing Campaign Investigation
A financial institution detects a phishing email targeting employees. The email contains a link to malicious-login.com. The SOC analyst performs OSINT: WHOIS lookup shows the domain was registered 2 days ago with a privacy service. DNS query reveals MX record pointing to a free email provider. CT log search shows the domain has an SSL certificate issued by Let's Encrypt with Subject Alternative Names including admin.malicious-login.com and mail.malicious-login.com. Shodan scan shows the IP hosting the site has port 443 open and runs nginx. Pastebin search reveals the same IP in a list of 'phishing kits'. The analyst correlates this: the domain is likely part of a broader phishing campaign. The intelligence is fed into the SIEM as a new indicator. The firewall blocks the IP, and email filters block the domain. Common pitfall: relying solely on WHOIS without checking CT logs, missing subdomains.
Enterprise Scenario 2: Attack Surface Mapping for Merger
A company is acquiring a smaller tech firm. As part of due diligence, the security team maps the target's external attack surface using OSINT. They use Shodan to find all internet-facing devices associated with the target's IP ranges. They discover an unpatched FTP server (port 21) and an exposed Jenkins instance (port 8080) with default credentials. DNS enumeration reveals subdomains like dev.target.com and test.target.com that host development applications. LinkedIn search identifies employees with 'system administrator' titles, potentially high-value targets for social engineering. The OSINT findings are presented to the CISO, who mandates that the target secure these exposures before merger. Without OSINT, these vulnerabilities would remain hidden until post-merger integration, increasing risk.
Enterprise Scenario 3: Threat Actor Profiling
A government agency tracks a threat actor group known for targeting critical infrastructure. Analysts use OSINT to profile the group. They monitor Telegram channels where the group claims responsibility for attacks. They collect screenshots of malware samples posted on Twitter. WHOIS lookups on domains used in attacks reveal patterns: domains registered with a specific email provider (ProtonMail) and hosted on a particular VPS provider. Shodan shows the group's C2 servers use custom HTTP headers. This intelligence helps attribute future attacks to the same group. The challenge is volume: OSINT generates massive data; automated pipelines are essential to filter noise.
What CS0-003 Tests on OSINT Sources (Objective 1.1)
The exam focuses on identifying appropriate OSINT sources for different intelligence needs. You must know which source provides what type of data. Common questions: 'Which OSINT source would you use to find subdomains of a domain?' (Answer: Certificate Transparency logs). 'Which source reveals exposed services on a target IP?' (Answer: Shodan). 'Where would you look for leaked credentials?' (Answer: Pastebin or dark web forums).
Top Wrong Answers and Why
Choosing WHOIS for subdomain discovery: WHOIS provides domain registration info, not subdomains. Candidates often confuse WHOIS with DNS. The correct source for subdomains is CT logs or DNS brute-forcing.
Selecting social media for technical infrastructure: Social media is for person profiling, not service discovery. Candidates may think LinkedIn reveals technical details, but Shodan or DNS is more appropriate.
Using VirusTotal for real-time data: VirusTotal aggregates historical scans; it's not real-time. Candidates might think it shows current status, but it's a snapshot. For current data, use Shodan or direct scanning.
Confusing paste sites with dark web: Paste sites (e.g., Pastebin) are clearnet; dark web requires Tor. The exam may ask where to find 'recently leaked data'—paste sites are more accessible.
Specific Numbers, Values, and Terms
WHOIS fields: Registrant Organization, Creation Date, Expiration Date, Name Servers.
DNS record types: A, AAAA, MX, TXT, CNAME, NS.
CT logs: crt.sh is the most common tool.
Shodan filters: port, country, org, hostname.
VirusTotal: API key required; rate limit 4 req/min (free).
theHarvester: -d domain -b source.
Recon-ng: workspaces, modules.
Edge Cases and Exceptions
WHOIS privacy: GDPR causes many registrations to show 'REDACTED FOR PRIVACY'. The exam may test that you can still get org name if it's a business domain.
DNS ANY queries: Some DNS servers block ANY queries; use specific record types.
CT log delays: Certificates may take up to 24 hours to appear in CT logs.
Shodan refresh rate: Shodan scans the internet periodically; data may be days old.
How to Eliminate Wrong Answers
Match the intelligence need to the source's primary function. If the question asks for 'email addresses associated with a domain', eliminate sources that don't return emails (e.g., Shodan, WHOIS). Use process of elimination: if two sources could work, choose the most direct one. For example, for 'subdomains', CT logs are more reliable than DNS brute-forcing because they contain all issued certificates.
OSINT sources include WHOIS, DNS, CT logs, Shodan, paste sites, social media, and threat intelligence platforms.
For subdomain discovery, use Certificate Transparency logs (crt.sh) rather than WHOIS.
Shodan is the go-to source for identifying exposed services and devices on the internet.
WHOIS may show 'REDACTED FOR PRIVACY' due to GDPR; look for organization fields for business domains.
VirusTotal provides historical scan data, not real-time status; API rate limit is 4 requests per minute (free tier).
theHarvester automates email and subdomain enumeration from search engines and other sources.
Always verify OSINT data from multiple sources to ensure accuracy and avoid false positives.
OSINT collection must comply with legal and ethical boundaries; unauthorized access is not OSINT.
These come up on the exam all the time. Here's how to tell them apart.
WHOIS Lookups
Provides domain registration details (owner, dates, nameservers).
Data may be privacy-redacted for individuals.
Useful for identifying domain ownership and age.
Queried via whois command or web services.
Not effective for discovering subdomains.
Certificate Transparency Logs
Lists all SSL certificates issued for a domain, including subdomains.
Data is public and not redacted (certificate fields).
Useful for discovering subdomains and certificate issuers.
Queried via crt.sh or API.
Does not reveal domain ownership or registration dates.
Mistake
OSINT is always legal and risk-free.
Correct
OSINT is legal only if collected without unauthorized access. Scraping sites against ToS or accessing private areas (e.g., password-protected forums) may violate laws like CFAA. Always check terms of service and respect robots.txt.
Mistake
WHOIS always reveals the domain owner's name.
Correct
Due to GDPR and similar regulations, many WHOIS records show 'REDACTED FOR PRIVACY' or proxy information. Only the registrant's organization may be visible for business domains. Personal data is often hidden.
Mistake
VirusTotal provides real-time threat data.
Correct
VirusTotal is a historical database. It shows results from past scans, not current status. A file may be flagged as malicious now but clean when scanned months ago. For real-time, use live sandbox or Shodan.
Mistake
All paste sites are indexed by Google.
Correct
Many paste sites (e.g., Pastebin) allow users to create 'unlisted' pastes not indexed by search engines. These can still be found via the site's own search or API. The exam may test that not all data is Google-searchable.
Mistake
Shodan shows all internet-connected devices.
Correct
Shodan only indexes devices that respond to its probes on common ports. Devices on non-standard ports or behind firewalls may not appear. Also, Shodan's scan cycle is periodic (weeks to months), so data may be outdated.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Certificate Transparency (CT) logs are the best source for subdomain discovery. Every SSL certificate issued for a domain includes all Subject Alternative Names (SANs), which often list subdomains. Use crt.sh to query: `https://crt.sh/?q=%25.example.com`. This returns all certificates for *.example.com. WHOIS and DNS can also reveal subdomains but are less comprehensive. On the exam, CT logs are the expected answer for subdomain enumeration.
Check paste sites like Pastebin, Ghostbin, or dark web paste sites (via Tor). Use search operators: `site:pastebin.com example.com` or use tools like PasteHunter. Also, check breach databases like Have I Been Pwned (HIBP) for domain-related breaches. On the exam, paste sites are the primary source for credential leaks. Remember that unlisted pastes may not appear in search engines.
OSINT is a subset of passive reconnaissance. Passive reconnaissance involves collecting information without directly interacting with the target (e.g., no scanning or probing). OSINT uses publicly available sources, which is passive. Active reconnaissance (e.g., port scanning) involves direct interaction and may be detected. On the exam, OSINT is always passive; any active technique is not OSINT.
Yes, OSINT is crucial for tracking threat actors. Analysts monitor social media, forums, and Telegram channels for group communications. They also analyze infrastructure (domains, IPs, certificates) used in attacks. Tools like Maltego help visualize relationships. However, attribution must be careful—OSINT can be misleading if actors use false flags. The exam tests that OSINT aids but does not guarantee attribution.
Shodan is a search engine for internet-connected devices. It indexes banners from services like SSH, HTTP, and FTP. Analysts use Shodan to find exposed services, vulnerable devices, or C2 servers. For example, `ssl.cert.subject.cn:example.com` finds devices with a certificate for example.com. Shodan is a key source for technical infrastructure OSINT. On the exam, Shodan is the answer for 'internet-facing device discovery'.
Cross-reference data from multiple sources. For example, if WHOIS says a domain is owned by 'Example Corp', check the domain's website for a matching company name. Use DNS to verify IP addresses. Corroborate with threat intelligence platforms. The exam emphasizes that single-source OSINT is unreliable; always validate. Also, check timestamps—outdated data can lead to false conclusions.
No, OSINT is legal as long as you only collect publicly available information without bypassing access controls. However, scraping websites against their terms of service may violate laws in some jurisdictions. The exam expects you to know that OSINT is passive and legal, but always respect robots.txt and terms of use. Unauthorized access (e.g., using stolen credentials) is not OSINT.
You've just covered OSINT Sources for Threat Intelligence — now see how well it sticks with free CS0-003 practice questions. Full explanations included, no account needed.
Done with this chapter?