> vulnerability: CWE-611 —— owasp_classification: A05:2021 Security Misconfiguration —— impact: file disclosure / SSRF / RCE / DoS —— root_cause: the XML specification itself<span class="cursor-blink">_</span>_
XML External Entity injection — XXE — is not a bug in any particular software. It is a feature of the XML 1.0 specification itself, exploited against applications that parse XML input without disabling that feature. The specification defines a mechanism called 'external entities' that allows an XML document to reference and include content from external sources — local files, remote URLs, internal network resources. When an XML parser processes a document containing an external entity declaration, it dutifully resolves the reference and includes the content, exactly as the specification instructs.
The problem is that in a web application context, the XML document is supplied by the user — an untrusted source — and the parser executes on the server with the server's permissions and network access. An attacker who can control the XML input can instruct the parser to read any file the application has access to, make HTTP requests to internal systems the server can reach, or consume system resources until the application crashes. The parser is not misbehaving. It is doing exactly what the XML specification tells it to do. The vulnerability exists because the feature should never have been enabled for untrusted input.
XXE first appeared in the OWASP Top 10 in 2017 as a dedicated category (A4: XML External Entities). In the 2021 update, it was consolidated into A05: Security Misconfiguration — which is accurate, because XXE is fundamentally a configuration issue. Every major XML parser in every major programming language ships with external entity processing enabled by default, or has historically done so. Disabling it requires explicit configuration. Most developers never do.
To understand XXE, you need to understand three components of the XML specification: Document Type Definitions (DTDs), entities, and external entity declarations. A DTD defines the structure and permitted elements of an XML document. Within a DTD, you can declare entities — named storage units that act as variables within the document. An internal entity defines its content inline. An external entity defines its content by referencing an external resource via a URI.
The SYSTEM keyword is the critical instruction. It tells the XML parser to resolve the entity's content from an external URI. The URI can use any scheme the parser supports — file:// for local files, http:// or https:// for remote resources, and in some implementations, specialised schemes like expect:// (PHP), jar:// (Java), or gopher:// for protocol-level interaction. The attacker controls the URI. The parser resolves it with the server's permissions. That single mechanism enables every XXE attack variant.
In many applications, the XML parser processes the input but does not reflect entity content in the response. The attacker sends a malicious document, the parser resolves the external entity, but the file contents are consumed internally without being returned. This is 'blind XXE' — the vulnerability exists, but the standard in-band exploitation technique does not work because there is no visible output channel.
Blind XXE is not a lesser vulnerability. It simply requires different exfiltration techniques. Two primary methods exist.
The second blind XXE technique uses error-based exfiltration: the attacker crafts an entity that triggers a parser error containing the file contents within the error message. If the application displays or logs detailed error messages, the attacker can read the file contents from the error output. This technique works even when outbound HTTP connections from the server are blocked, as long as error messages are visible.
DNS-based exfiltration provides a third channel. Even when HTTP outbound is restricted, DNS queries are rarely blocked. The attacker constructs an entity that makes the parser resolve a hostname containing the exfiltrated data as a subdomain — for example, SENSITIVE-DATA.attacker.com. The attacker's authoritative DNS server logs the query, revealing the data. This is slower (limited by DNS label length) but extremely difficult to block.
The Billion Laughs attack is elegant in its simplicity. A document smaller than 1 KB causes the parser to allocate approximately 3 GB of memory. No external entity resolution is required — only internal entity expansion. This means that parsers configured to block external entities but that still permit DTD processing remain vulnerable. The only complete defence is to disable DTD processing entirely, or to implement entity expansion limits (which most modern parsers now support, but which must be explicitly configured).
The obvious XXE attack surface is any endpoint that accepts XML input — SOAP web services, XML-RPC endpoints, XML-based API request bodies, and file upload functionality that processes XML formats. But the less obvious attack surfaces are where we most frequently find exploitable XXE during engagements.
| Language / Parser | External Entities Default | DTD Processing Default | Key Mitigation |
|---|---|---|---|
| Java — DocumentBuilderFactory, SAXParser, DOM4J, JAXB | Enabled by default. Java is historically the most XXE-prone language. All major Java XML parsers process external entities unless explicitly disabled. | Enabled by default. | Set feature 'disallow-doctype-decl' to true. Disable external general and parameter entities. Set XMLConstants.FEATURE_SECURE_PROCESSING. Each parser requires different API calls — consult the OWASP cheat sheet for your specific parser. |
| PHP — libxml (SimpleXML, DOMDocument, XMLReader) | Enabled by default in older versions. Since PHP 8.0, libxml 2.9+ disables external entity loading by default via LIBXML_NOENT. Older PHP versions are vulnerable. | Enabled by default. | Call libxml_disable_entity_loader(true) on PHP < 8.0. Use LIBXML_NONET flag to prevent network access. Verify libxml version — anything below 2.9 is vulnerable by default. |
| Python — xml.etree, xml.dom, xml.sax, lxml | Varies by parser. xml.etree.ElementTree does not resolve external entities by default. lxml resolves them by default. xml.sax varies by configuration. | Varies. | Use defusedxml library — a drop-in replacement that disables all dangerous XML features by default. For lxml, set resolve_entities=False and no_network=True. The Python documentation explicitly warns about XML vulnerabilities. |
| .NET — XmlDocument, XmlReader, XmlTextReader, XPathDocument | Changed across versions. .NET Framework 4.5.2+ disabled DTD processing by default in most parsers. Earlier versions are vulnerable. XmlTextReader in .NET Framework < 4.5.2 resolves external entities by default. | Depends on version. | Set DtdProcessing = DtdProcessing.Prohibit for all parsers. Set XmlResolver to null. Ensure XmlReaderSettings prohibits DTD. Verify your .NET Framework version — anything below 4.5.2 requires explicit hardening. |
| Ruby — REXML, Nokogiri | REXML expands entities by default. Nokogiri (which uses libxml2) disables network access by default but does expand local entities. | Enabled. | For Nokogiri, use Nokogiri::XML::ParseOptions::NONET. For REXML, the entity expansion limit was introduced in Ruby 2.0.0 but external resolution must be manually controlled. |
The pattern is consistent: most XML parsers across most languages are either vulnerable by default or have been vulnerable by default in recent historical versions. Secure configuration requires explicit action by the developer. In the absence of that action — which is the norm in the codebases we review — the application is vulnerable to XXE.
Our approach to XXE testing during penetration testing engagements follows a systematic methodology designed to identify both obvious and hidden XML parsing surfaces.
| XXE Variant | Attacker Gains | CVSS Severity | Real-World Consequence |
|---|---|---|---|
| File Disclosure | Contents of any file readable by the application — configuration files, credentials, private keys, source code, user data. | High (7.5–9.1) | Database credentials in config files lead to full database compromise. Private keys enable impersonation. Source code reveals further vulnerabilities. /etc/shadow enables offline password cracking. |
| SSRF | HTTP requests to internal services — cloud metadata, admin panels, databases, APIs not exposed to the internet. | High to Critical (7.5–10.0) | AWS metadata access yields IAM credentials → full cloud account compromise. Internal admin panel access enables configuration changes. Internal API access enables data manipulation. |
| Blind XXE (OOB) | Same as file disclosure but via out-of-band channel — slower but equally impactful. | High (7.5–9.1) | Same consequences as in-band file disclosure. Harder to detect because exfiltration occurs via DNS or HTTP to attacker infrastructure, not in the application response. |
| Denial of Service | Application crash, resource exhaustion, service unavailability. | High (7.5) | Production application outage. 1 KB payload causes 3 GB memory allocation. Repeated attacks can prevent recovery. |
| Remote Code Execution | Full server compromise — the attacker executes arbitrary commands with the application's privileges. | Critical (9.8–10.0) | Complete control of the server. Data theft, ransomware deployment, pivot to internal network. Rare but devastating — primarily via PHP expect:// or Java-specific vectors. |
The most important thing to understand about XXE mitigation is that it is a configuration problem with a configuration solution. You do not need to rewrite your application. You do not need to implement complex input validation. You need to configure your XML parser to not process the features that enable XXE. The OWASP XXE Prevention Cheat Sheet provides parser-specific instructions for every major language.
XXE is a vulnerability that should not exist in 2026. The root cause is known (XML specification features enabled by default in parsers processing untrusted input). The fix is known (disable DTD processing or external entity resolution). The OWASP cheat sheet provides parser-specific instructions for every major language. And yet we continue to find XXE in production applications — particularly in SOAP services, file upload handlers, Office document processors, and SAML integrations where the XML parsing happens in a library rather than in code the development team wrote.
The reason XXE persists is the same reason most configuration-based vulnerabilities persist: the secure configuration is not the default, and developers reasonably assume that default configurations are safe. They are not. If your application parses XML from any untrusted source — user input, uploaded files, external feeds, SAML assertions, SOAP requests — your XML parser needs explicit configuration to disable the features that enable XXE. One configuration change. Applied to every parser. That is the entire mitigation.
The impact of failing to make that change ranges from embarrassing (reading /etc/passwd during a penetration test) to catastrophic (cloud account compromise via SSRF to the metadata service, or complete server takeover via RCE). The effort to exploit it ranges from trivial (a few lines of XML in Burp Suite) to moderate (blind OOB exfiltration via parameter entities). The effort to prevent it is minimal — a few lines of parser configuration. The cost-benefit calculation is not close.
Our penetration testers systematically identify and exploit XXE vulnerabilities across all XML parsing surfaces — including the hidden ones in file upload handlers, SAML integrations, and APIs that accept XML via Content-Type switching. We test in-band, blind, and OOB variants, and provide specific parser configuration remediation for your technology stack.