> diff vuln_list.csv narrative_report.pdf | wc -l && echo 'everything'_
Two penetration testing firms assess the same environment — a mid-sized manufacturing company with 800 employees, an Active Directory domain, a mixture of on-premises and cloud workloads, and a custom ERP system. Both firms are competent. Both spend the same number of days. Both find the same vulnerabilities. The reports they produce are fundamentally different.
Provider A delivers a 90-page PDF. The executive summary lists statistics: 38 findings — 2 critical, 7 high, 14 medium, 9 low, 6 informational. The body of the report is a sequential list of findings, each with a title, a CVSS score, an affected host, a description, and a remediation recommendation. The findings are sorted by severity, highest first. There is no relationship between one finding and the next. Each exists in isolation.
Provider B delivers a 55-page PDF. The executive summary tells a story: "Within two hours of connecting to the network, the tester captured domain credentials from broadcast traffic, escalated privileges through a misconfigured service account, and obtained access to the ERP database containing production schedules, supplier contracts, and customer pricing — information that would give a competitor a decisive market advantage. The entire path could be broken by a single Group Policy change that takes fifteen minutes to implement."
Both reports contain the same 38 findings. Provider A's report is accurate. Provider B's report is useful. The difference isn't what they found — it's what they communicated. Provider A produced a vulnerability list. Provider B produced a narrative.
A vulnerability list documents what exists. A narrative report documents what happens. "LLMNR is enabled" is a finding. "LLMNR being enabled allowed the tester to capture credentials without touching any system, crack them in four seconds, and use them to access the ERP database within two hours" is understanding. The finding is the same. The impact — and therefore the urgency — is only visible in the narrative.
The vulnerability list is the path of least resistance in pen test reporting. It's structurally simple: one finding per section, sorted by severity, each self-contained. It's easy to produce — many tools generate it automatically. And it's what most organisations have come to expect, because it's what most providers deliver.
The list is technically accurate. Every finding is real. Every CVSS score is correct. But the reader — whether it's the CISO, the IT manager, or the board — sees 38 independent problems and starts at the top: F-001, SMB signing, CVSS 9.1. They fix that first, because the score says it's the most severe.
Meanwhile, the actual attack path to the ERP database — the chain that represents the organisation's most critical business risk — is scattered across findings F-003, F-004, F-002, F-005, and F-006. No finding in the chain has the highest CVSS score individually. The relationship between them is invisible. The reader doesn't know the chain exists, and their remediation effort is directed at a finding that, while important, wasn't part of the path the tester actually used to reach the crown jewels.
A narrative report contains the same findings — but wraps them in context. It tells the reader what happened, in what order, why each step enabled the next, and what the cumulative impact was. It transforms isolated data points into a coherent understanding of how the organisation can actually be compromised.
Same findings. Same environment. Same tester skill level. But the reader of the narrative understands something the reader of the list does not: that these five findings are not five separate problems — they're five links in a single chain that ends at the ERP database. Breaking any one link prevents the entire compromise. And the cheapest link to break — disabling LLMNR — takes fifteen minutes.
A vulnerability's true severity depends entirely on its context — what it enables, what surrounds it, and what an attacker can reach through it. The CVSS score measures the technical characteristics of the vulnerability in isolation. Context measures what it means in this specific environment, with these specific configurations, for this specific organisation.
| Finding | CVSS Score (No Context) | Actual Severity (With Context) |
|---|---|---|
| LLMNR enabled | Medium (5.3–7.4 depending on scoring interpretation). Broadcast protocol misconfiguration. No direct data access. | Critical in this environment. LLMNR enabled the first step of the chain that reached the ERP database. Without it, the tester had no credentials. With it, the tester had credentials in 90 seconds. The CVSS score reflects the finding in isolation. The context reflects its role as the entry point to the organisation's most sensitive data. |
| SMB signing not required | Critical (9.1). Enables relay attacks against systems that don't enforce signing. | High, but not in the critical chain. SMB signing was not required on 12 hosts — but the tester didn't need a relay attack because the cracked credentials provided direct access. In a different environment, this could be the critical chain entry point. In this environment, it's an important standalone finding but not part of the path that actually reached the crown jewels. |
| TLS 1.0 enabled on ERP portal | Medium (5.2). Weak cryptographic protocol. Theoretical interception risk. | Low in practice. TLS 1.0 on the ERP portal is a cryptographic weakness, but exploiting it requires a man-in-the-middle position and significant computational effort. The tester accessed the ERP database directly using stolen credentials — the TLS version was irrelevant to the actual attack path. Worth fixing, but not urgent. |
| svc_erp is local admin on ERPSRV01 | High (7.0). Excessive privilege for a service account. | Critical in this environment. This single permission turned a compromised service credential into full administrative access on the server that hosts the organisation's most sensitive data. Remove this permission, and the tester has a password for a service account that can't do anything damaging. The CVSS score is 7.0. The business impact is existential. |
In a vulnerability list, the IT team fixes F-001 (CVSS 9.1) first. In a narrative report, they disable LLMNR and remove svc_erp's local admin rights first — because the narrative reveals that these two changes, taking a combined 30 minutes, break the entire path to the ERP database. The CVSS 9.1 finding is important. It's just not the one that reached the crown jewels.
If narrative reports are clearly more valuable, why do most providers still deliver vulnerability lists? The reasons are structural, not malicious — but understanding them helps organisations demand better.
| Reason | Why It Happens | The Consequence |
|---|---|---|
| Tool-driven output | Automated scanners (Nessus, Qualys, Burp Suite) produce finding-per-vulnerability output by default. Many providers use scanner output as the report skeleton — adding a branded template, an executive summary, and some manual findings around the edges. | The report structure mirrors the tool's output format, not the organisation's decision-making needs. Findings are presented as the scanner found them: individually, without context, sorted by severity score. |
| Lists are faster to write | A list requires documenting each finding independently — a formulaic process that can be partially automated. A narrative requires synthesising findings into chains, assessing contextual severity, identifying break points, and writing prose that communicates to multiple audiences. This takes significantly more time and skill. | Providers under commercial pressure to maximise testing time and minimise reporting time produce lists because lists are quicker. The 80/20 testing-to-reporting split that maximises billable testing hours produces reports that minimise deliverable value. |
| Narrative requires different skills | Writing a good narrative requires analytical thinking (which findings chain?), business acumen (what's the real-world impact?), and communication skill (can the board understand this?). Not all testers have all three. The industry hires for technical skill and assumes reporting skill will follow. It often doesn't. | Reports that read like technical journals — accurate but impenetrable. Executive summaries that mention Kerberoasting and NTDS.dit without explaining what these mean for the business. |
| Clients don't know to ask | Most organisations have never received a narrative report, so they don't know what they're missing. The vulnerability list is their baseline expectation. They evaluate providers on testing methodology, tester qualifications, and price — rarely on report quality. | Demand doesn't drive supply. Providers who invest in reporting quality have no competitive advantage if clients don't evaluate the deliverable. The market rewards cheaper testing, not better reporting. |
| CVSS creates a false sense of prioritisation | CVSS provides a numerical score that appears to rank findings objectively. Providers and clients alike use it as a substitute for contextual analysis — if the score is high, the finding is important; if it's low, it can wait. | Findings are prioritised by their isolated technical severity rather than their role in the actual attack path. A CVSS 5.3 finding that's the entry point to the critical chain is deprioritised. A CVSS 9.1 finding that's standalone gets fixed first. The scoring system actively misdirects remediation effort. |
CVSS scores are useful — they provide a standardised, reproducible measure of a vulnerability's technical characteristics. But they measure the vulnerability in isolation, and vulnerabilities don't exist in isolation. They exist in an environment, alongside other vulnerabilities, behind (or not behind) compensating controls, and with (or without) a path to sensitive data.
Contextual severity supplements the CVSS score with the factors that determine real-world risk:
| Context Factor | The Question It Answers | How It Changes Severity |
|---|---|---|
| Chainability | Does this finding connect to other findings in an attack chain? Is it a link in a path to critical compromise? | A medium-severity finding that's the entry point to a chain ending at Domain Admin is functionally critical. A critical-severity finding that's standalone and leads nowhere may be less urgent. |
| Data proximity | How close is this finding to sensitive data? Does exploiting it give direct or indirect access to the organisation's crown jewels? | A misconfiguration on the ERP server that holds customer pricing is more severe than an identical misconfiguration on a print server. The vulnerability is the same. The data behind it isn't. |
| Compensating controls | Are there other controls that mitigate the risk? Does the SOC detect exploitation? Does network segmentation limit lateral movement? Does MFA prevent credential reuse? | A Kerberoastable service account in an environment with no detection is more severe than the same account in an environment where the SOC detects Kerberos anomalies within five minutes. |
| Exploitation complexity in situ | How difficult is exploitation in this specific environment — not in theory? Does it require local access, or can it be exploited remotely? Does it require authentication the attacker may not have? | CVSS scores theoretical exploitation complexity. Contextual severity scores actual exploitation difficulty as demonstrated during the engagement. |
| Business impact | What happens to the business if this is exploited? Customer data breach? Operational shutdown? Regulatory fine? Competitive disadvantage? Reputational damage? | A finding that leads to the exposure of 100,000 customer records has a different business impact from a finding that leads to the exposure of a development server's hostname — even if their CVSS scores are similar. |
A narrative report naturally communicates contextual severity because the story shows the chain, the data, the compensating controls (or their absence), and the business impact. A vulnerability list can only communicate CVSS severity because each finding is presented in isolation, without the context that would change the score.
A vulnerability list documents what exists. A narrative report documents what happens. The list tells you that LLMNR is enabled, that a service account is Kerberoastable, and that svc_erp has local admin rights on the ERP server. The narrative tells you that these three findings — individually rated medium to high — combine into a chain that reaches the ERP database containing your customer pricing, supplier contracts, and production schedules in under two hours, with zero alerts generated, and that disabling LLMNR via a fifteen-minute Group Policy change breaks the entire chain.
The list and the narrative contain the same findings. They communicate entirely different things. The list communicates data. The narrative communicates understanding. And understanding is what drives decisions — which findings to fix first, how much to invest, what to present to the board, and whether the organisation's security posture is genuinely improving or merely accumulating remediation tickets.
Context is everything. A finding without context is a data point. A finding with context — its place in the chain, its proximity to sensitive data, its business impact, the cheapest way to break it — is intelligence. The difference between a vulnerability list and a narrative report is the difference between data and intelligence. One fills a spreadsheet. The other changes behaviour.
Our reports include chronological attack narratives, chain analysis with break points, contextual severity ratings, and remediation guidance specific enough to implement without further research — because a finding without context is just another line in a spreadsheet.