Penetration Testing

Vulnerability List vs Narrative Report: Why Context Is Everything

> diff vuln_list.csv narrative_report.pdf | wc -l && echo 'everything'_

Peter Bassill 26 August 2025 15 min read
reporting narrative risk context attack chains remediation prioritisation

One tells you what's broken. The other tells you what it means.

Two penetration testing firms assess the same environment — a mid-sized manufacturing company with 800 employees, an Active Directory domain, a mixture of on-premises and cloud workloads, and a custom ERP system. Both firms are competent. Both spend the same number of days. Both find the same vulnerabilities. The reports they produce are fundamentally different.

Provider A delivers a 90-page PDF. The executive summary lists statistics: 38 findings — 2 critical, 7 high, 14 medium, 9 low, 6 informational. The body of the report is a sequential list of findings, each with a title, a CVSS score, an affected host, a description, and a remediation recommendation. The findings are sorted by severity, highest first. There is no relationship between one finding and the next. Each exists in isolation.

Provider B delivers a 55-page PDF. The executive summary tells a story: "Within two hours of connecting to the network, the tester captured domain credentials from broadcast traffic, escalated privileges through a misconfigured service account, and obtained access to the ERP database containing production schedules, supplier contracts, and customer pricing — information that would give a competitor a decisive market advantage. The entire path could be broken by a single Group Policy change that takes fifteen minutes to implement."

Both reports contain the same 38 findings. Provider A's report is accurate. Provider B's report is useful. The difference isn't what they found — it's what they communicated. Provider A produced a vulnerability list. Provider B produced a narrative.

The Core Distinction

A vulnerability list documents what exists. A narrative report documents what happens. "LLMNR is enabled" is a finding. "LLMNR being enabled allowed the tester to capture credentials without touching any system, crack them in four seconds, and use them to access the ERP database within two hours" is understanding. The finding is the same. The impact — and therefore the urgency — is only visible in the narrative.


The format the industry defaults to.

The vulnerability list is the path of least resistance in pen test reporting. It's structurally simple: one finding per section, sorted by severity, each self-contained. It's easy to produce — many tools generate it automatically. And it's what most organisations have come to expect, because it's what most providers deliver.

The Vulnerability List — What Provider A Delivered
F-001 [CRITICAL] SMB signing not required on 12 hosts # CVSS 9.1
F-002 [CRITICAL] svc_erp Kerberoastable, password cracked # CVSS 8.8
F-003 [HIGH] LLMNR/NBT-NS broadcast protocols enabled # CVSS 7.4
F-004 [HIGH] NTLMv2 hashes captured, 6/14 cracked # CVSS 7.1
F-005 [HIGH] svc_erp member of local admin on ERPSRV01 # CVSS 7.0
F-006 [HIGH] ERP database accessible with svc_erp creds # CVSS 7.0
F-007 [HIGH] No network segmentation between office and prod # CVSS 6.8
F-008 [MEDIUM] Missing HTTP security headers on ERP portal # CVSS 5.3
F-009 [MEDIUM] TLS 1.0 enabled on ERPSRV01:443 # CVSS 5.2
... # 29 more findings

# What the reader sees:
# 38 independent findings
# 2 critical, 7 high — all seemingly separate problems
# No indication that F-003 → F-004 → F-002 → F-005 → F-006 = ERP breach
# Reader prioritises F-001 (highest CVSS) — but F-001 wasn't in the chain

The list is technically accurate. Every finding is real. Every CVSS score is correct. But the reader — whether it's the CISO, the IT manager, or the board — sees 38 independent problems and starts at the top: F-001, SMB signing, CVSS 9.1. They fix that first, because the score says it's the most severe.

Meanwhile, the actual attack path to the ERP database — the chain that represents the organisation's most critical business risk — is scattered across findings F-003, F-004, F-002, F-005, and F-006. No finding in the chain has the highest CVSS score individually. The relationship between them is invisible. The reader doesn't know the chain exists, and their remediation effort is directed at a finding that, while important, wasn't part of the path the tester actually used to reach the crown jewels.


The same findings, with the story that connects them.

A narrative report contains the same findings — but wraps them in context. It tells the reader what happened, in what order, why each step enabled the next, and what the cumulative impact was. It transforms isolated data points into a coherent understanding of how the organisation can actually be compromised.

The Narrative — What Provider B Delivered
# ATTACK PATH 1: Network Access → ERP Database Compromise

Step 1: Connected to meeting room port. No NAC. Full network access.
Context: Any visitor or device gains unrestricted internal network access.

Step 2: LLMNR poisoning captured NTLMv2 hash for j.smith (F-003, F-004).
Context: Credential capture required no interaction with any system — broadcast
poisoning is passive. Any device on the network captures credentials.

Step 3: j.smith password cracked in 4 seconds: 'Summer2025!' (F-004).
Context: Password satisfies the complexity policy but follows a predictable
pattern. 6 of 14 captured hashes cracked within 60 seconds.

Step 4: Kerberoasted svc_erp — password cracked: 'ERPadmin1!' (F-002).
Context: svc_erp is a domain service account with an SPN. Any authenticated
user can request its Kerberos ticket. Password was set in 2019.

Step 5: svc_erp is local admin on ERPSRV01 (F-005).
Context: Service account has admin rights on the ERP server — not because
the application requires it, but because it was configured that way
during initial setup and never reviewed.

Step 6: ERP database accessed with svc_erp credentials (F-006).
Context: Database contains production schedules, supplier contracts, customer
pricing, and employee records. No database access logging enabled.

# Total time: 1 hour 47 minutes. Alerts generated: 0.

# Chain break points (any one prevents ERP compromise):
CHEAPEST: Disable LLMNR via GPO (15 min, breaks Step 2)
FASTEST: Change svc_erp password to 25+ char random (5 min, breaks Step 4)
STRONGEST: Migrate svc_erp to gMSA + remove local admin (breaks Steps 4+5)
STRATEGIC: Implement network segmentation (breaks Step 1 for non-IT users)

Same findings. Same environment. Same tester skill level. But the reader of the narrative understands something the reader of the list does not: that these five findings are not five separate problems — they're five links in a single chain that ends at the ERP database. Breaking any one link prevents the entire compromise. And the cheapest link to break — disabling LLMNR — takes fifteen minutes.


The same finding, different severity.

A vulnerability's true severity depends entirely on its context — what it enables, what surrounds it, and what an attacker can reach through it. The CVSS score measures the technical characteristics of the vulnerability in isolation. Context measures what it means in this specific environment, with these specific configurations, for this specific organisation.

Finding CVSS Score (No Context) Actual Severity (With Context)
LLMNR enabled Medium (5.3–7.4 depending on scoring interpretation). Broadcast protocol misconfiguration. No direct data access. Critical in this environment. LLMNR enabled the first step of the chain that reached the ERP database. Without it, the tester had no credentials. With it, the tester had credentials in 90 seconds. The CVSS score reflects the finding in isolation. The context reflects its role as the entry point to the organisation's most sensitive data.
SMB signing not required Critical (9.1). Enables relay attacks against systems that don't enforce signing. High, but not in the critical chain. SMB signing was not required on 12 hosts — but the tester didn't need a relay attack because the cracked credentials provided direct access. In a different environment, this could be the critical chain entry point. In this environment, it's an important standalone finding but not part of the path that actually reached the crown jewels.
TLS 1.0 enabled on ERP portal Medium (5.2). Weak cryptographic protocol. Theoretical interception risk. Low in practice. TLS 1.0 on the ERP portal is a cryptographic weakness, but exploiting it requires a man-in-the-middle position and significant computational effort. The tester accessed the ERP database directly using stolen credentials — the TLS version was irrelevant to the actual attack path. Worth fixing, but not urgent.
svc_erp is local admin on ERPSRV01 High (7.0). Excessive privilege for a service account. Critical in this environment. This single permission turned a compromised service credential into full administrative access on the server that hosts the organisation's most sensitive data. Remove this permission, and the tester has a password for a service account that can't do anything damaging. The CVSS score is 7.0. The business impact is existential.

In a vulnerability list, the IT team fixes F-001 (CVSS 9.1) first. In a narrative report, they disable LLMNR and remove svc_erp's local admin rights first — because the narrative reveals that these two changes, taking a combined 30 minutes, break the entire path to the ERP database. The CVSS 9.1 finding is important. It's just not the one that reached the crown jewels.


Five things you can't see without a narrative.

Attack Chains
A chain of three medium-severity findings that combines into a critical compromise path is invisible in a list. Each finding is presented independently, scored independently, and prioritised independently. The reader never learns that findings 3, 4, and 5 are three links in a single chain — or that breaking any one link prevents the compromise. The list hides the relationship. The narrative reveals it.
The Cheapest Fix
A narrative identifies chain break points — the single remediation that disrupts the entire attack path for the least effort. "Disable LLMNR: 15 minutes, breaks the chain at Step 2" is intelligence that a list cannot provide, because a list doesn't show the chain. Without the narrative, the IT team fixes every finding individually, starting with the highest CVSS score. With the narrative, they fix the one that matters most first.
Business Impact
"Kerberoastable service account (CVSS 8.8)" is a technical finding. "A Kerberoastable service account gave the tester administrative access to the ERP database containing production schedules, supplier contracts, and customer pricing — information that would give a competitor a decisive market advantage" is a business risk. The technical severity is the same. The business impact is only visible when the finding is placed in the context of what it led to.
Time and Effort
A narrative includes the timeline — the tester connected at 14:00 and reached the ERP database at 15:47. That temporal context communicates urgency in a way no CVSS score can: "an attacker can reach your most sensitive data in under two hours from a meeting room network port" is a sentence that changes board behaviour. A CVSS score of 8.8 doesn't.
Detection Failures
"Alerts generated: 0" only has meaning in the context of a narrative that shows what the attacker did. The list tells you the finding exists. The narrative tells you the finding was exploited for two hours without detection — which reveals that the SOC, EDR, and SIEM all failed to detect the specific attack path that reached the crown jewels. That's a different finding entirely, and it's invisible without the story.

The structural reasons the industry defaults to the wrong format.

If narrative reports are clearly more valuable, why do most providers still deliver vulnerability lists? The reasons are structural, not malicious — but understanding them helps organisations demand better.

Reason Why It Happens The Consequence
Tool-driven output Automated scanners (Nessus, Qualys, Burp Suite) produce finding-per-vulnerability output by default. Many providers use scanner output as the report skeleton — adding a branded template, an executive summary, and some manual findings around the edges. The report structure mirrors the tool's output format, not the organisation's decision-making needs. Findings are presented as the scanner found them: individually, without context, sorted by severity score.
Lists are faster to write A list requires documenting each finding independently — a formulaic process that can be partially automated. A narrative requires synthesising findings into chains, assessing contextual severity, identifying break points, and writing prose that communicates to multiple audiences. This takes significantly more time and skill. Providers under commercial pressure to maximise testing time and minimise reporting time produce lists because lists are quicker. The 80/20 testing-to-reporting split that maximises billable testing hours produces reports that minimise deliverable value.
Narrative requires different skills Writing a good narrative requires analytical thinking (which findings chain?), business acumen (what's the real-world impact?), and communication skill (can the board understand this?). Not all testers have all three. The industry hires for technical skill and assumes reporting skill will follow. It often doesn't. Reports that read like technical journals — accurate but impenetrable. Executive summaries that mention Kerberoasting and NTDS.dit without explaining what these mean for the business.
Clients don't know to ask Most organisations have never received a narrative report, so they don't know what they're missing. The vulnerability list is their baseline expectation. They evaluate providers on testing methodology, tester qualifications, and price — rarely on report quality. Demand doesn't drive supply. Providers who invest in reporting quality have no competitive advantage if clients don't evaluate the deliverable. The market rewards cheaper testing, not better reporting.
CVSS creates a false sense of prioritisation CVSS provides a numerical score that appears to rank findings objectively. Providers and clients alike use it as a substitute for contextual analysis — if the score is high, the finding is important; if it's low, it can wait. Findings are prioritised by their isolated technical severity rather than their role in the actual attack path. A CVSS 5.3 finding that's the entry point to the critical chain is deprioritised. A CVSS 9.1 finding that's standalone gets fixed first. The scoring system actively misdirects remediation effort.

A better model for rating what matters.

CVSS scores are useful — they provide a standardised, reproducible measure of a vulnerability's technical characteristics. But they measure the vulnerability in isolation, and vulnerabilities don't exist in isolation. They exist in an environment, alongside other vulnerabilities, behind (or not behind) compensating controls, and with (or without) a path to sensitive data.

Contextual severity supplements the CVSS score with the factors that determine real-world risk:

Context Factor The Question It Answers How It Changes Severity
Chainability Does this finding connect to other findings in an attack chain? Is it a link in a path to critical compromise? A medium-severity finding that's the entry point to a chain ending at Domain Admin is functionally critical. A critical-severity finding that's standalone and leads nowhere may be less urgent.
Data proximity How close is this finding to sensitive data? Does exploiting it give direct or indirect access to the organisation's crown jewels? A misconfiguration on the ERP server that holds customer pricing is more severe than an identical misconfiguration on a print server. The vulnerability is the same. The data behind it isn't.
Compensating controls Are there other controls that mitigate the risk? Does the SOC detect exploitation? Does network segmentation limit lateral movement? Does MFA prevent credential reuse? A Kerberoastable service account in an environment with no detection is more severe than the same account in an environment where the SOC detects Kerberos anomalies within five minutes.
Exploitation complexity in situ How difficult is exploitation in this specific environment — not in theory? Does it require local access, or can it be exploited remotely? Does it require authentication the attacker may not have? CVSS scores theoretical exploitation complexity. Contextual severity scores actual exploitation difficulty as demonstrated during the engagement.
Business impact What happens to the business if this is exploited? Customer data breach? Operational shutdown? Regulatory fine? Competitive disadvantage? Reputational damage? A finding that leads to the exposure of 100,000 customer records has a different business impact from a finding that leads to the exposure of a development server's hostname — even if their CVSS scores are similar.

A narrative report naturally communicates contextual severity because the story shows the chain, the data, the compensating controls (or their absence), and the business impact. A vulnerability list can only communicate CVSS severity because each finding is presented in isolation, without the context that would change the score.


Getting narrative, not just numbers.

Demand an Attack Narrative
When commissioning a pen test, specify that the report must include a chronological attack narrative showing how findings chain together. If the provider's sample report is a flat vulnerability list, ask whether they can produce a narrative. If they can't — or if they charge extra for it — that tells you something about how they value their deliverable.
Ask for Contextual Severity, Not Just CVSS
Request that each finding's severity accounts for its role in the attack chain, its proximity to sensitive data, the presence or absence of compensating controls, and the specific business impact in your environment. A finding rated 'Critical in this environment because it's the entry point to the ERP database chain' drives different action from a finding rated 'CVSS 7.4'.
Ask for Chain Break Points
For every attack chain, the report should identify every point where the chain can be broken — and which break point requires the least effort, the least cost, and the least operational disruption. This is the intelligence that transforms a 47-finding remediation backlog into a focused action plan: fix these three things and the critical chains are broken.
Compare Sample Reports Before Procurement
Request redacted sample reports from every provider you're evaluating. Compare them side by side. Does one tell a story while the other lists findings? Does one show chains while the other shows a spreadsheet? Does one identify the cheapest fix while the other just lists remediations? The sample report is the single best predictor of what you'll receive.
Use the Debrief to Hear the Story
Even if the written report is structured as a list, the verbal debrief is your opportunity to hear the narrative directly from the tester. Ask: 'Walk me through the attack path. What was the first thing you did, and how did each step lead to the next?' The tester knows the story — the question is whether the report tells it.

The bottom line.

A vulnerability list documents what exists. A narrative report documents what happens. The list tells you that LLMNR is enabled, that a service account is Kerberoastable, and that svc_erp has local admin rights on the ERP server. The narrative tells you that these three findings — individually rated medium to high — combine into a chain that reaches the ERP database containing your customer pricing, supplier contracts, and production schedules in under two hours, with zero alerts generated, and that disabling LLMNR via a fifteen-minute Group Policy change breaks the entire chain.

The list and the narrative contain the same findings. They communicate entirely different things. The list communicates data. The narrative communicates understanding. And understanding is what drives decisions — which findings to fix first, how much to invest, what to present to the board, and whether the organisation's security posture is genuinely improving or merely accumulating remediation tickets.

Context is everything. A finding without context is a data point. A finding with context — its place in the chain, its proximity to sensitive data, its business impact, the cheapest way to break it — is intelligence. The difference between a vulnerability list and a narrative report is the difference between data and intelligence. One fills a spreadsheet. The other changes behaviour.


Penetration test reports that tell the story, not just the statistics.

Our reports include chronological attack narratives, chain analysis with break points, contextual severity ratings, and remediation guidance specific enough to implement without further research — because a finding without context is just another line in a spreadsheet.