> sort -k2 -rn findings.csv | head -1 && echo 'this might not be your biggest problem'_
We deliver a pen test report. It's well-structured, clearly written, and evidenced. The executive summary communicates business risk. The attack narrative shows how the tester reached the crown jewels. The remediation guidance is specific enough to implement without further research.
Three months later, we return for the follow-up assessment. The organisation has remediated twelve findings — but the four findings that formed the attack chain to the ERP database are all still open. The CVSS 9.1 standalone finding has been fixed. The CVSS 5.3 finding that was the entry point to the critical chain hasn't been touched. The IT manager sorted by severity score and worked top-down. The attack narrative was read once and never referenced again.
The report didn't fail. The interpretation did. And this pattern — a competent report misread by a well-intentioned team — is more common than the industry acknowledges. The following mistakes account for the vast majority of cases where pen test investment doesn't translate into proportionate security improvement.
This is the single most common interpretation mistake — and the one with the greatest consequence. The IT manager receives the report, opens the findings section, sorts by CVSS score, and starts remediating from the top. The highest-scoring findings get fixed first. The lower-scoring findings wait.
The problem: CVSS measures a vulnerability's technical characteristics in isolation. It doesn't measure the vulnerability's role in the attack chain, its proximity to sensitive data, or whether compensating controls mitigate it. A CVSS 9.1 standalone finding that leads nowhere may be less urgent than a CVSS 5.3 finding that's the entry point to a chain ending at the organisation's most sensitive data.
| The Approach | What Gets Fixed First | What Gets Left |
|---|---|---|
| Sort by CVSS, work top-down | The highest-scoring standalone findings. SMB signing (9.1). Apache Struts on a dev server (10.0). Findings that are technically severe but may not have been part of the demonstrated attack path. | The chain findings. LLMNR (5.3) that was the entry point. The Kerberoastable service account (8.8) that was the escalation step. The local admin privilege (7.0) that unlocked the ERP server. The chain that actually reached the crown jewels. |
| Sort by attack chain, fix break points | The findings that formed demonstrated attack paths. The chain entry point (regardless of CVSS). The cheapest break point. The finding that, once fixed, prevents the entire compromise. | Standalone findings are addressed second — still important, but not the path to the organisation's most sensitive data. The remediation effort goes where the demonstrated risk is greatest. |
The fix: Read the attack narrative before the findings list. Identify the chain findings. Fix the chain break points first — regardless of their individual CVSS scores. Then address standalone findings by contextual severity.
The IT manager sets a 30-day remediation target for all critical and high findings. One of those findings is "disable LLMNR" — a 15-minute Group Policy change. Another is "implement network segmentation between the office and production VLANs" — a six-month project requiring procurement, network redesign, and change management.
Both miss the 30-day deadline. The 15-minute fix missed it because it was queued behind longer tasks. The six-month project missed it because it was never achievable in 30 days. The team reports "two critical findings overdue" — which looks like the same failure for fundamentally different reasons.
The fix: Categorise findings by effort before setting deadlines. Quick wins (hours) get implemented this week. Standard remediations (days to weeks) get scheduled into change windows. Project-level changes (months) get scoped, funded, and tracked separately — with interim compensating controls applied immediately. A realistic plan with staggered deadlines produces more remediation than an unrealistic plan with a single deadline.
The report comes back with zero critical findings and four highs. The CISO presents to the board: "No critical findings — our security posture is strong." The board is reassured. Budget discussions for the proposed MFA project are deferred to next quarter.
The problem: "no critical findings" means the tester didn't find a single vulnerability that scores above 9.0 on the CVSS scale in isolation. It doesn't mean the organisation is secure. Three medium findings that chain together can produce a critical compromise. A high finding in the right context can lead to full domain compromise. And the absence of critical findings may reflect the scope of the test, not the absence of critical risk — if the tester had an additional two days, or tested from a different starting position, the results might look very different.
A penetration test is a point-in-time assessment with a defined scope. "No critical findings" means the tester didn't find critical vulnerabilities within the scope, timeframe, and methodology of this specific engagement. It doesn't mean critical vulnerabilities don't exist. It means the tester didn't find them — which may be because they aren't there, or because the scope, time, or approach didn't reveal them.
The fix: Report findings in the context of what was tested and what wasn't. Present the result as "within the scope of this engagement, the following findings were identified" rather than as a verdict on the organisation's overall security posture. And always read the attack narrative: the absence of a single critical finding doesn't mean the tester couldn't achieve their objective — they may have achieved it through a chain of medium findings.
The IT manager filters the report to show only critical and high findings. The nine low findings and four informational findings are never read. They're low priority. They can wait. They'll be picked up eventually.
Some of them genuinely can wait. But informational and low findings frequently include intelligence that changes the risk picture: verbose error messages disclosing internal paths and technology versions, default credentials on non-production systems that share a network with production, software version information that maps to known CVEs not yet exploited, and detection gaps where the SOC failed to identify the tester's activity. Individually, these are low severity. Combined with other findings — or with a motivated attacker's creativity — they become the reconnaissance that enables a more significant compromise.
The fix: Read every finding — including informational. Don't remediate them first, but understand what they reveal. Pay particular attention to informational findings about detection gaps, information disclosure, and default configurations — these are the findings that become chain links in a future engagement or a real attack.
The report identifies that j.smith's password was cracked in 4 seconds: Summer2025!. The IT manager sends j.smith an email telling them to choose a better password. The finding is marked "remediated." The password policy that accepted Summer2025! in the first place remains unchanged. Next quarter, k.jones sets their password to Autumn2025! and it's cracked in the same time.
Pen test findings almost always reflect systemic issues — misconfigured policies, missing controls, architectural weaknesses — not individual failures. A user who sets a weak password is following the path of least resistance that the password policy permits. An engineer who left a default credential unchanged did so because the onboarding process didn't include a credential rotation step. Blaming the individual fixes one instance. Fixing the system prevents all future instances.
The fix: For every finding, ask: "What system, policy, or process allowed this to happen?" Fix the system, not the symptom. Change the password policy to block common patterns, deploy Azure AD Password Protection with a custom banned word list, and enforce a minimum length that makes 4-second cracking computationally infeasible. The individual's password changes as a consequence. Every future password is also protected.
The report is delivered, the findings are remediated (or some of them are), the report is filed, and the organisation moves on. Next year, a new pen test is commissioned. The new tester has no access to the previous report. They start from scratch, rediscover some of the same findings, and the cycle repeats.
A pen test report is a snapshot — a point-in-time assessment of the organisation's security posture on the specific dates the testing was performed. Its value multiplies when it's treated as part of a longitudinal record: compared against previous reports, used to track remediation progress, and provided to future testers so they can build on previous findings rather than re-covering old ground.
| The One-Time Approach | The Longitudinal Approach |
|---|---|
| Each report is treated independently. No comparison with previous engagements. Recurring findings are rediscovered rather than tracked. | Each report includes a comparison with the previous engagement: findings remediated, findings persisting, and new findings. A remediation tracker is maintained across engagements. |
| The new tester has no context. They spend time re-scanning known issues instead of pushing deeper into the environment. | The new tester receives the previous report and the remediation tracker. They validate fixes, skip known territory, and focus on new attack surface and deeper compromise paths. |
| The board receives a new set of statistics each year. No trend. No trajectory. No way to measure whether security investment is producing improvement. | The board receives a trend: detection rate improved from 12% to 67%. Mean time to remediation decreased from 94 days to 23 days. Recurring findings reduced from 14 to 3. Security investment is producing measurable results. |
The fix: Maintain a remediation tracker across engagements. Provide the previous report and tracker to each new tester. Ask the provider to include a comparison section in every report. Present longitudinal trends to the board — not just single-engagement statistics.
"Last year we had 47 findings. This year we have 52. Security is getting worse." This is a sentence we've heard in board meetings — and it reflects a fundamental misinterpretation of what finding counts mean.
Finding counts are influenced by scope (a broader scope finds more), testing depth (more time produces more findings), methodology changes (a different approach reveals different issues), and infrastructure changes (new systems introduce new vulnerabilities). An increase in finding count may mean the organisation's security deteriorated — or it may mean the tester went deeper, the scope was expanded, or the organisation deployed new systems that introduced new attack surface.
Conversely, a decrease in finding count doesn't necessarily mean improvement. It may mean the tester ran out of time, the scope was narrowed, or the easy findings were fixed while the hard ones persist behind them.
The fix: Measure progress by metrics that reflect actual security improvement: the number of recurring findings from the previous engagement (should decrease), the time to achieve the tester's objective (should increase), the percentage of attacker actions detected by the SOC (should increase), and the mean time to remediate findings (should decrease). These metrics are harder to game and more meaningful than raw finding counts.
The engineering team works through the remediation roadmap for eight weeks. They implement the GPO changes, update the password policy, migrate the service accounts, and deploy the detection rules. Each finding is marked "remediated" in the tracker. The CISO reports to the board: "All critical and high findings remediated."
But nobody retested. The GPO was applied to the wrong OU and half the domain is still broadcasting LLMNR. The service account was migrated but the old credentials are still cached on three servers. The detection rule fires on the test payload but misses the variant the next attacker will use. "Remediated" in the tracker. Vulnerable in reality.
The fix: Every remediation should include a verification step — and the verification should use the same technique the tester used to discover the finding. The report should provide these steps (a good report will). If the report provides a verification command, run it. If it doesn't, ask the provider. And for critical findings, consider commissioning a targeted retest: a short, focused engagement where the tester validates that the specific fixes have been implemented correctly. A one-day retest that confirms remediation is worth more than eight weeks of unverified work.
The quality of a penetration test engagement is determined not just by the quality of the testing and the quality of the report — but by the quality of the interpretation. A good report badly interpreted produces the same outcome as a bad report: the wrong findings get fixed, the real risk persists, and the next engagement discovers the same attack path.
The most common mistakes are predictable and preventable: sorting by CVSS instead of chain position, setting uniform deadlines for findings that require vastly different effort, equating low finding counts with security, blaming individuals instead of fixing systems, treating the report as a one-time event instead of a longitudinal record, and remediating without retesting. Each mistake is a point where the investment in testing and reporting fails to translate into security improvement.
The fix for all of them is the same discipline: read the attack narrative before the findings list, prioritise by demonstrated risk rather than isolated severity scores, categorise by effort before setting deadlines, track meaningful metrics across engagements, verify remediations with the tester's own techniques, and treat every report as a chapter in a continuing story — not a standalone verdict.
Our reports include attack narratives with chain break points, contextual severity alongside CVSS, effort estimates for every finding, verification steps for every remediation, and longitudinal comparison with previous engagements — because a report that's misinterpreted is a report that's wasted.