Penetration Testing

What a Good Penetration Test Report Looks Like

> cat report.pdf | wc -l && echo 'nobody is reading all of this'_

Peter Bassill 5 August 2025 16 min read
reporting pen test deliverables executive summary remediation communication

A pen test isn't finished when the testing stops. It's finished when the right people understand what to do.

A penetration testing firm conducts a thorough internal infrastructure assessment over ten days. The testers are skilled. The methodology is sound. They achieve Domain Admin in under four hours, chain three low-severity misconfigurations into a critical attack path, identify 47 individual findings, and document every step with timestamps and evidence.

The report lands. It's 186 pages. The executive summary is three pages of technical prose that mentions Kerberoasting, NTDS.dit extraction, and LLMNR poisoning without explaining what any of them mean in business terms. The findings section lists 47 items with CVSS scores, CVE references, and remediation steps written for security engineers — but the IT team that needs to implement the fixes doesn't have a security engineer. The risk ratings don't distinguish between the three findings that chain into Domain Admin and the twelve informational issues about HTTP headers. Everything is presented with equal visual weight.

The report sits in the CISO's inbox for two weeks. The CISO reads the executive summary, doesn't understand the business impact, and forwards it to the IT manager. The IT manager opens it, sees 186 pages, skims the finding titles, fixes the two things they already know how to fix, and puts the rest in the backlog. Six months later, the next pen test achieves Domain Admin through the same attack path.

The testing was excellent. The report failed. And because the report failed, the testing was wasted.

The Uncomfortable Truth

The quality of a penetration test is not determined by the skill of the testers. It's determined by whether the report causes the right things to get fixed. A mediocre test with a clear, actionable report that drives remediation delivers more value than a brilliant test with a report that nobody reads, nobody understands, and nobody acts on.


One report, three readers — and they need different things.

Every pen test report has at least three audiences, each with different knowledge, different responsibilities, and different questions. A report that serves one audience well typically fails the other two — unless it's deliberately structured to address all three.

Audience Their Question What They Need What They Don't Need
The Board / Senior Leadership "Are we at risk? How much? What's the business impact? What do we need to invest to fix it?" A one-to-two-page executive summary in plain English. Business impact stated in terms they understand: customer data exposure, regulatory risk, financial liability, operational disruption. A clear statement of the most critical risk and the investment required to address it. Technical jargon. CVSS scores. CVE numbers. Attack chain diagrams. The phrase "Kerberoasting." Anything that requires security knowledge to interpret.
The CISO / Security Manager "What's the overall risk posture? Which findings are most critical? How do these chain together? What should we prioritise? How does this compare to last year?" A risk-rated summary of all findings with clear prioritisation. Attack narratives showing how findings chain together. A comparison against previous engagements showing improvement or regression. A remediation roadmap with realistic timelines. A flat list of 47 findings with equal visual weight. Remediation steps so vague they require research to implement. Findings that don't explain their relationship to each other.
The IT / Engineering Team "What exactly needs to be fixed? On which systems? How do we reproduce this? What's the specific remediation? How do we verify the fix worked?" Detailed technical findings with precise reproduction steps, affected systems, evidence (screenshots, request/response pairs, command output), specific remediation instructions, and verification steps. Group Policy paths. Configuration file locations. Code examples for fixes. Business context they already understand. Executive language that obscures the technical detail they need. Remediation steps that say "implement best practices" without specifying what the best practice is.

A good report serves all three audiences — typically through distinct sections that each reader can navigate to directly. The executive summary for the board. The strategic overview for the CISO. The technical findings for the engineering team. Each section should be readable independently, because in practice, each audience will read only their section.


The sections that every good report contains.

Section Purpose Length Guidance
Executive summary A non-technical summary of the engagement outcome, the most critical risks identified, their potential business impact, and the key investment or action required. Written for someone who will spend two minutes reading it. 1–2 pages. No more. If the executive summary exceeds two pages, it's not a summary — it's a short report that nobody will finish.
Scope and methodology What was tested, what wasn't, the testing dates, the methodology used, any limitations or constraints encountered, and the tester's starting position (black box, grey box, white box). 1–2 pages. Establishes the boundaries of the assessment so readers know what the findings cover — and what they don't.
Attack narrative A chronological story of the tester's path through the environment — from initial access through to demonstrated impact. Written as a narrative, not a list. Shows how individual findings chain together into the actual compromise path. 2–5 pages depending on complexity. This is the section the CISO reads most closely. It transforms individual findings into a coherent understanding of how the organisation can actually be compromised.
Risk summary A visual or tabular overview of all findings grouped by severity, with a clear indication of which findings chain together and which are standalone. A risk heatmap or severity distribution chart. 1 page. The at-a-glance view that lets the CISO and IT manager understand the volume and distribution of findings before diving into detail.
Technical findings The detailed, per-finding section. Each finding documented with a consistent structure: title, severity, affected systems, description, evidence, business impact, remediation, and verification steps. Variable — typically 2–4 pages per critical/high finding, 1–2 per medium, and half a page per low/informational. This is the bulk of the report and the section the IT team works from.
Remediation roadmap A prioritised list of remediation actions grouped into timeframes: immediate (this week), short-term (this month), medium-term (this quarter). Accounts for dependencies between remediations and realistic implementation effort. 1–2 pages. Transforms the findings into an actionable plan. Without this, the IT team has a list of problems but no guidance on where to start.
Appendices Raw technical evidence, full scan outputs, extended screenshots, tool output, and any supplementary data that supports the findings but would clutter the main report. As long as needed. Appendices exist so the main report stays readable while the evidence remains available for anyone who wants to verify or reproduce.

What separates a finding that drives action from one that gathers dust.

The individual finding is the atomic unit of a pen test report. If the finding is well-written, it communicates the risk, the evidence, the impact, and the fix clearly enough that someone can act on it without needing to phone the tester. If it's poorly written, it joins the backlog and stays there.

A Bad Finding — Vague, Unactionable, No Evidence
Title: Weak Password Policy
Severity: Medium
Description: The password policy does not enforce sufficient complexity.
Impact: Attackers may be able to guess passwords.
Remediation: Implement a stronger password policy.

# Problems:
# - Which password policy? AD? A web app? Which system?
# - What does 'sufficient complexity' mean? What is it currently?
# - 'May be able to' — did they actually crack passwords or not?
# - 'Implement a stronger password policy' — what specifically?
# - No evidence. No affected systems. No reproduction steps.
# - An IT team cannot act on this without further research.
A Good Finding — Specific, Evidenced, Actionable
Title: Active Directory Password Policy Permits Common Passwords
Severity: High
Affected: Default Domain Policy — ACME.LOCAL
MITRE: T1110.001 — Brute Force: Password Guessing

Description:
The AD password policy requires 8+ chars, 1 uppercase,
1 number, 1 special character. It does not block common
patterns. 14 of 47 captured NTLMv2 hashes (30%) cracked in
under 60 seconds using rockyou.txt + best64.rule.

Evidence:
cracked: Summer2025! (4 sec), Welcome123 (2 sec), Acme2024# (12 sec)
policy: minPwdLength=8, complexity=enabled, history=12
missing: no banned word list, no breach database check

Impact:
Passwords following predictable patterns (Season+Year+Symbol,
Company+Year) satisfy the complexity policy but crack in seconds.
30% of captured credentials were compromised, including
svc_backup (Backup Operators group → DA escalation path).

Remediation:
1. Deploy Azure AD Password Protection (or Entra Password
Protection) with custom banned word list including:
company name, seasons, months, 'welcome', 'password'.
2. Increase minimum length to 14 characters.
3. Enforce gMSA for all service accounts (svc_backup priority).
4. Implement Have I Been Pwned integration via AD password filter.

Verification:
Attempt to set password 'Summer2025!' — should be rejected.
Confirm svc_backup is migrated to gMSA (Get-ADServiceAccount).

The first finding tells the reader there's a problem. The second finding tells them exactly what the problem is, proves it exists, explains why it matters in business terms, tells them precisely how to fix it, and tells them how to confirm the fix worked. The first finding gets backlogged. The second finding gets fixed.


The two pages that determine whether anything gets funded.

The executive summary is the most important section of the report — because it's the only section that the decision-makers will read. If the executive summary fails to communicate the business risk in terms the board understands, the report's technical excellence is irrelevant. Nothing gets funded. Nothing gets prioritised. Nothing changes.

Bad Executive Summary Pattern Why It Fails
"The assessment identified 47 findings: 3 critical, 8 high, 15 medium, 12 low, and 9 informational." Numbers without context. The board doesn't know whether 47 findings is good or bad. They don't know what "critical" means in business terms. They have no basis for action.
"The tester achieved Domain Admin access through Kerberoasting of the svc_backup service account, followed by NTDS.dit extraction via secretsdump." Technical narrative in a non-technical section. The board doesn't know what Kerberoasting is. They don't know what NTDS.dit means. The sentence communicates expertise but not risk.
"The organisation's security posture is broadly in line with industry expectations for a company of this size and sector." Reassuring without being informative. "In line with industry expectations" means nothing if the industry expectation is that 85% of internal tests achieve Domain Admin. The board leaves the meeting feeling comfortable. The risk remains.
"Numerous critical and high-severity vulnerabilities were identified across the environment, representing significant risk to the organisation. Immediate remediation is recommended." Alarmist without being specific. Which vulnerabilities? What risk, specifically? How much will remediation cost? The board leaves the meeting feeling anxious but without the information needed to authorise action.

A good executive summary answers five questions in plain English, in two pages or less: What did we test? (scope and approach). What's the headline? (one sentence that captures the most critical outcome). What's the business impact? (what could an attacker achieve, expressed as customer data exposure, financial loss, regulatory consequence, or operational disruption). What needs to happen? (the top three actions, not all 47). What will it cost? (approximate effort, not just "immediate remediation recommended").


The problems we see in reports from other providers.

When clients bring us reports from previous engagements — either for a second opinion or because they're switching providers — the same problems appear repeatedly. These aren't edge cases. They're endemic patterns in the industry.

Scanner Output Repackaged as a Report
The report contains 200+ findings, most of which are verbatim output from Nessus, Qualys, or OpenVAS — pasted into a branded PDF template with no human analysis. Findings include informational issues like "ICMP timestamp response detected" alongside genuine critical vulnerabilities, with no distinction in presentation. No attack narrative. No chain analysis. No evidence of manual testing. The client paid for a penetration test and received an automated scan with a cover page.
Every Finding Treated Equally
Forty-seven findings listed sequentially with no visual or structural hierarchy. The three findings that chain into Domain Admin are presented with the same layout, the same font size, and the same page weight as the twelve informational issues about HTTP security headers. The reader has no way to distinguish "this combination of findings gives an attacker access to every customer record" from "this web server doesn't set X-Content-Type-Options."
No Attack Narrative
The report lists findings individually but never explains how they connect. The reader sees "LLMNR enabled" and "Kerberoastable service account" and "Backup Operator with DC access" as three separate medium/high findings — never understanding that together they form the complete attack path to Domain Admin. Without a narrative, the relationship between findings is invisible and the true severity is hidden.
Vague Remediation
"Implement a more robust password policy." "Harden the Active Directory configuration." "Apply the principle of least privilege." These aren't remediations — they're intentions. A remediation should be specific enough that an engineer can implement it without further research: the Group Policy path, the configuration parameter, the specific permission to remove, the command to run.
Written for the Tester, Not the Reader
The report reads as a journal of the tester's activities rather than a communication designed for its audience. "I then ran Responder and captured three NTLMv2 hashes" is interesting to another pen tester. The CISO needs: "Network broadcast protocols allowed credential capture without interacting with any system — within two minutes of connecting to the network." Same finding, different audience.
No Remediation Roadmap
Forty-seven findings, each with its own remediation recommendation, but no guidance on which to fix first, which depend on each other, or how to plan the work across a realistic timeframe. The IT team is left to triage the findings themselves — which they'll do based on what's easiest to fix, not what matters most.

The section that transforms a list into understanding.

The attack narrative is the section most often missing from pen test reports — and the section that provides the most value when it's present. It tells the story of the engagement: what the tester did, in what order, how each action enabled the next, and what they achieved at each stage. It transforms isolated findings into a coherent understanding of how the organisation can actually be compromised.

Attack Narrative — Excerpt
# Phase 1: Network Access (14:00 - 14:05)
Connected to meeting room network port (RJ45, no NAC).
DHCP assigned 10.0.1.47. Full subnet access confirmed.
Impact: Any device connected to this port has unrestricted network access.

# Phase 2: Credential Capture (14:05 - 14:12)
LLMNR poisoning captured NTLMv2 hash for ACME\j.smith (Finding F-001).
Hash cracked in 4 seconds: Summer2025! (Finding F-002).
Impact: Valid domain credential obtained without interacting with any system.

# Phase 3: Escalation (14:12 - 14:58)
BloodHound enumeration identified svc_backup as Kerberoastable (F-003).
TGS ticket cracked in 11 seconds: Backup2019! (F-002 — same root cause).
svc_backup is member of Backup Operators (F-004).
Impact: Backup Operator can read any file on the DC, including NTDS.dit.

# Phase 4: Domain Compromise (14:58 - 15:20)
NTDS.dit extracted via secretsdump — all domain password hashes (F-005).
DA hash used for pass-the-hash to DC01 (F-006).
Impact: Complete domain compromise. Every user, every machine, every secret.

# Chain Summary
F-001 → F-002 → F-003 → F-004 → F-005 → F-006 = Domain Admin
Break at any point: disable LLMNR (F-001), enforce strong passwords
(F-002) , migrate svc_backup to gMSA (F-003), remove Backup Ops (F-004).
Cheapest fix: disable LLMNR via GPO (15 minutes, breaks entire chain).

The narrative shows the reader something the findings list cannot: the chain. Six findings that individually rate medium to high combine into a complete domain compromise in 80 minutes. The narrative also identifies the cheapest break point — the single control that disrupts the entire chain for the least effort. That intelligence is worth more than any individual finding.


How to tell whether the report you're getting is good enough.

Before commissioning a pen test, ask the provider for a redacted sample report. If they can't or won't provide one, that's a signal. A provider confident in their deliverable quality is happy to demonstrate it. When evaluating the sample — or your current provider's reports — ask these questions:

Question Good Answer Warning Sign
Can a non-technical board member read the executive summary and understand the business risk? The summary is in plain English, avoids jargon, and states impact in business terms: data exposure, financial liability, regulatory risk. The summary contains CVSS scores, CVE numbers, or technical terms without explanation. The reader needs security knowledge to interpret it.
Does the report include an attack narrative showing how findings chain together? A chronological story of the tester's path, showing which findings enabled which escalation steps and where the chain could be broken. Findings listed individually with no explanation of how they relate. No chain analysis. No identification of the cheapest fix that breaks the critical path.
Are the remediation steps specific enough to implement without further research? Group Policy paths, configuration commands, specific parameters, code examples, and verification steps for every finding. "Implement best practices." "Harden the configuration." "Apply the principle of least privilege." Vague guidance that requires the reader to research the actual fix.
Does the report distinguish between findings that chain into critical compromise and standalone informational issues? Visual hierarchy, chain analysis, and severity ratings that account for chainability — not just individual CVSS scores. All findings presented with equal weight. A flat list where LLMNR poisoning (enabler of the DA chain) sits beside a missing HTTP header (informational, standalone).
Does the report include a prioritised remediation roadmap? Actions grouped by timeframe (immediate, short-term, medium-term) with dependencies identified and effort estimates provided. No roadmap. Findings each have individual remediation steps but no guidance on sequencing, prioritisation, or where to start.
Is there evidence that a human tester performed manual analysis — not just an automated scan? Attack narratives, chain analysis, business-context impact statements, findings that require human judgement (access control, business logic, social engineering). 200+ findings that look like scanner output. No attack narrative. No chain analysis. Findings that a tool could have produced without a human tester.

Getting more value from the report you already receive.

Request a Sample Report Before Engagement
Ask every prospective provider for a redacted sample. Compare them side by side. Look for the attack narrative, the executive summary quality, the remediation specificity, and the overall structure. The sample report is the single best predictor of the deliverable quality you'll actually receive.
Insist on a Verbal Debrief
The written report is one communication channel. A 60-minute debrief where the tester walks through the attack narrative, demonstrates the critical findings live, and answers questions from both technical and non-technical stakeholders is worth as much as the report itself. If your provider doesn't offer this, ask for it.
Share the Right Sections with the Right People
Don't forward the entire 186-page PDF to the board. Extract the executive summary and send it as a standalone document. Send the technical findings to the IT team. Send the remediation roadmap to the project manager. Each audience gets the section written for them — and only that section.
Track Remediation Against the Report
Use the report's findings as a remediation tracker. Each finding should have a status: remediated, in progress, accepted risk, or deferred. When the next pen test is commissioned, provide the tracker to the tester — they'll validate the fixes and focus their effort on areas that remain open.
Compare Reports Across Engagements
The value of pen testing compounds when you can track progress. Ask your provider to include a comparison section showing which findings from the previous engagement have been remediated, which persist, and which are new. This transforms the report from a snapshot into a progress tracker.

The bottom line.

A penetration test is an investment in understanding risk. The report is how that understanding is communicated. If the report fails — if the executive summary doesn't convey business impact, if the findings don't chain into a coherent narrative, if the remediations are too vague to implement, if the 47 findings are presented as 47 equally-weighted items — the investment is wasted. The testing was done. The understanding wasn't delivered.

A good report serves three audiences with three different needs: the board needs business impact in plain English, the CISO needs strategic context and prioritisation, and the IT team needs specific, actionable remediation steps with evidence and verification instructions. It includes an attack narrative that shows how findings chain together, a remediation roadmap that tells the team where to start, and an executive summary that can be read in two minutes and understood by someone who has never heard of Kerberoasting.

The quality of a penetration test is not measured by the skill of the tester. It's measured by whether the report causes the right things to get fixed. Ask for a sample report before you commission the test. If the sample doesn't meet the standard described in this article, the engagement won't either — no matter how talented the tester behind it.


Penetration test reporting designed for every audience.

Our reports include plain-English executive summaries, chronological attack narratives with chain analysis, specific remediation steps with verification instructions, and prioritised remediation roadmaps — because a finding that doesn't get fixed is a finding that didn't matter.