Google Dorking, also known as Google Hacking, is a powerful OSINT (Open Source Intelligence) technique that uses advanced search operators to uncover sensitive information indexed by Google. This chapter covers how to construct and execute Google dorks, the types of information you can discover, and how to protect against such reconnaissance. On the PT0-002 exam, approximately 5-10% of questions touch on OSINT techniques, with Google Dorking being a key component of Domain 2.1 (Reconnaissance and Enumeration).
Jump to a section
Think of Google as a massive public library that has cataloged every book (web page) on the shelves. A normal user searches by typing a few words into the main search box, like asking a librarian for books about 'cats.' The librarian brings back a pile of books that mention cats. But Google Dorking is like having a secret set of index cards that let you search by specific metadata: you can ask for books published in 2020 (site:example.com), books with a specific word in the title (intitle:password), or books that contain a certain phrase in the first chapter (intext:confidential). The librarian doesn't just look at the cover; she flips through the index cards for exact matches. For example, 'filetype:pdf' is like asking for only paperback books. 'intitle:login' is like asking for books with 'login' in the title. 'inurl:admin' is like asking for books whose call number contains 'admin.' The power is that you can combine these index card searches to find books that no one would ever find by just browsing the shelves—like a book about 'cats' published in 2020 with 'password' in the first chapter. That's Google Dorking: using advanced operators to query the index metadata directly, not just the full text.
What is Google Dorking?
Google Dorking is the practice of using advanced search operators in Google Search to find specific strings, file types, or configurations in the index that often expose sensitive information. The term was popularized by Johnny Long in his book 'Google Hacking for Penetration Testers.' The underlying mechanism relies on Google's crawler indexing the content of web pages, including metadata, URLs, and file contents, and then allowing users to query that index with operators that go beyond simple keyword matching.
How Google's Index Works
Google's web crawler (Googlebot) systematically downloads web pages, follows links, and stores the content in a massive index. For each page, it records:
The full text of the page
The page title (from the <title> tag)
The URL
The file type (e.g., pdf, doc, xls)
The page's links
Metadata like meta tags, headings, etc.
When you perform a search, Google's algorithms match your query against this index. Normal searches use keyword matching across the full text. Advanced operators, however, query specific fields of the index. For example:
- intitle: restricts results to pages where the search term appears in the title.
- inurl: restricts results to pages where the term appears in the URL.
- filetype: restricts results to a specific file type (e.g., pdf, xls, doc).
- site: restricts results to a specific domain or subdomain.
- intext: restricts results to pages where the term appears in the body text.
- cache: shows the cached version of a page.
- link: (deprecated but still works in some contexts) shows pages that link to a given URL.
Key Operators and Their Usage on PT0-002
The exam expects you to know the following operators and their typical use cases:
intitle: - Finds pages with a specific word in the title. Example: intitle:"index of" often reveals directory listings.
inurl: - Finds pages with a specific word in the URL. Example: inurl:admin can find admin panels.
filetype: - Finds files of a specific extension. Example: filetype:sql may expose database dumps.
site: - Restricts search to a domain. Example: site:example.com limits results to that domain.
intext: - Finds pages with the term in the body. Example: intext:password finds pages containing the word 'password'.
cache: - Shows Google's cached version of a page. Useful for viewing pages that are currently down or changed.
link: - (Deprecated but exam may reference) Finds pages linking to a URL.
related: - Finds similar pages.
info: - Shows information about a page.
define: - Shows definitions of a term.
allintitle: - Like intitle but for multiple words (all must be in title).
allinurl: - Like inurl for multiple words.
allintext: - Like intext for multiple words.
inanchor: - Finds pages where the term appears in anchor text (link text).
allinanchor: - For multiple words in anchor text.
numrange: - Finds numbers in a range (e.g., numrange:100-200).
daterange: - Finds pages within a date range (using Julian dates).
stocks: - Looks up stock information.
source: - Used in Google News to specify a source.
Constructing Google Dorks
A typical Google dork combines multiple operators to narrow down results. For example:
intitle:"index of" inurl:admin site:example.comThis dork looks for directory listings (pages with 'index of' in the title) that contain 'admin' in the URL, restricted to example.com.
Common dork patterns for penetration testing include:
Finding login pages: inurl:login site:target.com
Finding exposed configuration files: filetype:env intext:DB_PASSWORD
Finding backup files: filetype:bak or filetype:old
Finding database dumps: filetype:sql intext:INSERT INTO
Finding directory listings: intitle:"index of"
Finding exposed documents with sensitive terms: intext:"confidential" filetype:pdf
The Google Hacking Database (GHDB)
The GHDB is a repository of tested dorks maintained by Offensive Security (exploit-db.com). It categorizes dorks by type, such as:
Footprints: Find information about targets.
Files containing passwords.
Sensitive directories.
Web server detection.
Vulnerable files.
Error messages.
Network or vulnerability data.
The exam may ask you to identify dorks from the GHDB or to choose the correct operator for a given scenario.
Defending Against Google Dorking
As a penetration tester, you may also need to recommend mitigations. Defenses include:
Using robots.txt to disallow crawlers from indexing sensitive directories. However, robots.txt is a guideline, not an enforcement; malicious crawlers can ignore it.
Using authentication and access controls to prevent public access to sensitive pages.
Not storing sensitive information in publicly accessible web directories.
Using noindex meta tags on pages that should not be indexed.
Regularly monitoring the GHDB for dorks that expose your organization's data.
Legal and Ethical Considerations
Google Dorking is a passive reconnaissance technique because you are only querying Google's public index, not directly interacting with the target. However, using dorks to access sensitive information (e.g., passwords) without authorization may violate laws like the Computer Fraud and Abuse Act (CFAA) in the US. On the PT0-002 exam, you should understand that Google Dorking is legal as long as you do not attempt to access or use the discovered data without permission. The exam focuses on the technical aspects, but ethical boundaries are implied.
Advanced Techniques
Using wildcards: * can be used as a placeholder. Example: "password * 123".
Using quotes for exact phrases: "exact phrase".
Using OR (|) and AND (space) logic: password OR passwd.
Excluding terms with minus: password -example.
Using parentheses for grouping: (password OR passwd) filetype:txt.
Interacting with Related Technologies
Google Dorking is often combined with other OSINT tools like Shodan (for internet-connected devices), Censys, and theHarvester. For example, you might use Google Dorking to find exposed admin panels and then use Shodan to find open ports on those servers. The exam may present scenarios where you need to choose the appropriate OSINT technique.
Identify Target Domain
Start by determining the target's primary domain name (e.g., example.com). This is the anchor for all subsequent dorks. Use the `site:` operator to scope your search to that domain. For example, `site:example.com` will return only pages indexed from that domain. This step is crucial because it eliminates noise from other sites. At this point, you might also identify subdomains using tools like Sublist3r or by using the `site:` operator with a wildcard (e.g., `site:*.example.com`). The exam expects you to know that `site:` is case-insensitive and includes all subdomains by default unless you specify a subdomain.
Enumerate Directory Listings
Directory listings occur when a web server has directory browsing enabled and no default index file (like index.html) is present. Google indexes these pages, and they often expose file structures. Use the dork `intitle:"index of" site:example.com` to find such pages. Alternatively, `inurl:"index of"` can also work. These listings can reveal backup files, configuration files, or other sensitive data. The exam may ask you to identify which operator finds directory listings. Remember that `intitle:"index of"` is the classic dork for this.
Find Login Pages
Login pages are common targets for brute-force attacks. Use dorks like `inurl:login site:example.com` or `inurl:admin site:example.com`. You can also search for specific CMS login pages, e.g., `inurl:wp-login site:example.com` for WordPress. The `inurl:` operator matches any part of the URL, so `inurl:login` will match /login, /login.php, /login.html, etc. The exam may test your ability to choose the correct operator to find login pages.
Discover Sensitive Files
Sensitive files like database dumps, configuration files, and password lists are often inadvertently exposed. Use `filetype:` to target specific extensions. For example: `filetype:sql site:example.com` for SQL files, `filetype:env site:example.com` for environment variables, `filetype:bak` for backup files. Combine with `intext:` to find files containing specific strings, e.g., `filetype:sql intext:INSERT INTO site:example.com`. The exam may ask you to find a file type that typically contains sensitive data, such as .env, .bak, .sql, .log, .conf, .inc, .old, .txt with passwords.
Extract Information from Error Messages
Error messages can reveal software versions, paths, and other technical details. Use dorks like `intext:"error" intext:"warning" site:example.com` or more specific ones like `intext:"SQL syntax" site:example.com` for SQL errors. Also, `intext:"PHP Error"` can reveal PHP version and paths. The exam may test your ability to identify dorks that find error messages. Note that error messages are often generated by default and may not be indexed if they are dynamically generated, but many are still cached.
In an enterprise environment, Google Dorking is often used during the reconnaissance phase of a penetration test. For example, a pentester targeting a large financial institution might start by enumerating all subdomains using site:*.bank.com and then look for exposed admin panels with inurl:admin. They might find a test server at test.bank.com/admin that uses default credentials. In another scenario, a manufacturer might have exposed a .xls file containing employee names and salaries. The pentester would use filetype:xls site:manufacturer.com to find such files.
A common misconfiguration is leaving backup files (e.g., .bak, .old) in the web root. For instance, a PHP application might have a backup of config.php named config.php.bak. Google may index this file, revealing database credentials. The pentester would use filetype:bak site:target.com to find these.
Defenders can protect against Google Dorking by implementing proper access controls, using robots.txt to disallow indexing of sensitive directories, and regularly scanning the GHDB for dorks that expose their data. For example, a company might add the following to robots.txt:
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /backup/However, robots.txt is only a suggestion; malicious crawlers ignore it. Therefore, it's better to use authentication and to avoid storing sensitive files in web-accessible directories.
Performance considerations: Google Dorking is rate-limited by Google. If you run too many queries in a short time, Google may block your IP or require CAPTCHA. Pentesters often use proxies or API keys to avoid this. Also, the results are only as current as Google's last crawl. A file might have been removed but still appear in the index for days or weeks.
What goes wrong: A common mistake is using the wrong operator. For example, using inurl: when intitle: is needed, or forgetting to use quotes for exact phrases. Another mistake is not scoping the search with site:, resulting in irrelevant results. Also, pentesters might overlook the fact that Google's operators are case-insensitive, but the terms themselves are case-insensitive as well. Finally, relying solely on Google Dorking without combining it with other OSINT techniques can miss a lot of information.
The PT0-002 exam tests Google Dorking under Objective 2.1: Given a scenario, use appropriate tools and techniques for reconnaissance and enumeration. You should be able to identify the correct operator for a given purpose, understand how to combine operators, and recognize common dorks from the GHDB.
Common wrong answers:
1. Choosing inurl: when intitle: is needed. For example, to find directory listings, the correct operator is intitle:"index of", not inurl:"index of". Many candidates confuse the two because both can sometimes work, but the classic dork uses intitle:.
2. Using filetype: without quotes or with incorrect extensions. The exam may ask for a dork to find PDF files; the correct syntax is filetype:pdf, not filetype:"pdf" (though quotes are allowed). Also, some candidates forget that filetype: does not require a dot before the extension.
3. Thinking that site: includes only the exact domain. In reality, site:example.com includes all subdomains (e.g., www.example.com, mail.example.com). To exclude subdomains, you need to use site:example.com -inurl:www or similar.
4. Overlooking the cache: operator. The exam might ask how to view a cached version of a page; the answer is cache:URL.
Specific numbers and terms: Know that the GHDB is hosted at exploit-db.com. The operator link: is deprecated but may still appear. The daterange: operator uses Julian dates (e.g., 2454832). The numrange: operator finds numbers in a range (e.g., numrange:1000-2000).
Edge cases: Google may ignore some operators if they are not well-formed. For example, intitle:keyword works, but intitle: "keyword" with a space after the colon may not. Also, some operators like link: are deprecated and may not work reliably. The exam may test that filetype: can be used with site: to narrow results.
How to eliminate wrong answers: Understand the underlying mechanism. If the question asks for a dork to find pages with a specific word in the title, the answer must use intitle:. If it asks for a specific file type, use filetype:. If it asks for pages on a specific domain, use site:. Always read the question carefully to determine which field of the index is being targeted.
Google Dorking uses advanced operators to query Google's index metadata.
The `site:` operator restricts results to a specific domain, including subdomains.
`intitle:"index of"` is the classic dork for directory listings.
`filetype:env` can expose environment variables with database credentials.
The Google Hacking Database (GHDB) at exploit-db.com contains pre-tested dorks.
Defenses include robots.txt, authentication, and noindex meta tags.
Operators are case-insensitive and can be combined with quotes, minus, and OR.
Google Dorking is passive reconnaissance and generally legal.
Common wrong exam answer: confusing `intitle:` with `inurl:`.
The `cache:` operator shows Google's cached version of a page.
These come up on the exam all the time. Here's how to tell them apart.
Google Dorking
Queries Google's index of web pages and files.
Finds sensitive information in web content (e.g., passwords in documents).
Uses operators like intitle, inurl, filetype.
Passive reconnaissance: does not interact with target.
Limited to what Google has crawled.
Shodan
Queries Shodan's index of internet-connected devices.
Finds open ports, services, and banners of devices.
Uses filters like port, country, org, product.
Can be passive or active (if you use Shodan's scan feature).
Includes devices that may not have web interfaces.
Mistake
Google Dorking is illegal.
Correct
Google Dorking itself is legal because you are querying Google's public index. However, using discovered information to access systems without authorization may be illegal. The PT0-002 exam focuses on the technical aspects, but you should know that passive reconnaissance is generally legal.
Mistake
The `site:` operator only returns the exact domain, not subdomains.
Correct
`site:example.com` returns results from example.com and all its subdomains (e.g., www.example.com, mail.example.com). To exclude subdomains, you must use additional operators like `-inurl:www`.
Mistake
Operators are case-sensitive.
Correct
Google's operators are case-insensitive. `intitle:` and `INTITLE:` work the same way. The search terms themselves are also case-insensitive.
Mistake
You can use the `filetype:` operator to search for any file extension.
Correct
`filetype:` works only for file extensions that Google recognizes and indexes. Common ones include pdf, doc, xls, ppt, txt, html, php, asp, jpg, gif, png, swf, css, js, xml, sql, bak, old, etc. Not all extensions are indexed.
Mistake
Google Dorking requires special tools.
Correct
Google Dorking can be performed directly in the Google search bar using operators. No special tools are required, though there are automated tools like GoogD0rker and SiteDigger that can help.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
`intitle:` searches for the term in the HTML title tag of the page, while `inurl:` searches for the term in the URL itself. For example, `intitle:login` finds pages titled 'login', whereas `inurl:login` finds pages with 'login' in the URL, like /login.php. The exam often tests this distinction.
Use `filetype:sql intext:INSERT INTO site:target.com` to find SQL dump files containing INSERT statements. You can also use `filetype:sql intext:password` or simply `filetype:sql` to find any SQL files. The GHDB has many examples.
Yes, use `inurl:login site:target.com` or `inurl:admin site:target.com`. For specific CMS, use `inurl:wp-login` for WordPress or `inurl:user/login` for Drupal. The `inurl:` operator is ideal for finding login pages because they often have 'login' in the URL.
The GHDB is a repository of Google dorks maintained by Offensive Security. It categorizes dorks by type (e.g., files containing passwords, sensitive directories, error messages). You can access it at exploit-db.com/google-hacking-database. The exam may reference it.
Yes, as long as you have authorization. Google Dorking is passive reconnaissance and does not interact with the target directly. However, using discovered information to access systems without permission is illegal. Always stay within the scope of your engagement.
Use robots.txt to disallow crawling of sensitive directories, implement authentication for admin pages, avoid storing sensitive files in web-accessible directories, use noindex meta tags on sensitive pages, and regularly scan the GHDB for dorks that expose your data. Remember that robots.txt is not a security measure but a guideline.
`cache:` shows Google's cached version of a page. For example, `cache:example.com` displays the last indexed version. This is useful for viewing pages that are currently down or have changed. The exam may ask how to view a cached page.
You've just covered Google Dorking for OSINT — now see how well it sticks with free PT0-002 practice questions. Full explanations included, no account needed.
Done with this chapter?