> echo $? && echo 'pen testing does not return 0 or 1'_
The board meeting follows a familiar pattern. The CISO presents the pen test results. The first question from the non-executive director: "So, did we pass?" The CISO explains that the tester achieved Domain Admin. The follow-up question: "So we failed?" The CISO tries to explain that it's not that simple — that the tester's time to objective increased by 300%, that the SOC detected four of seven actions, that recurring findings dropped from fourteen to three. But the framing is already set. The board heard "the tester got in" and filed it as a failure.
The pass/fail framing is destructive because it reduces a rich data set — attack paths, detection performance, remediation effectiveness, architectural maturity — to a single binary that tells the board almost nothing. An organisation where the tester achieves DA in two hours with zero detection has not "failed" in the same way as an organisation where the tester achieves DA in five days after bypassing three detection layers. Both "failed" the DA test. One is catastrophically insecure. The other has a maturing security programme with specific, identifiable gaps.
Equally, an organisation where the tester does not achieve DA has not necessarily "passed." The tester may have run out of time. The scope may have excluded the systems where the vulnerability exists. The rules of engagement may have prevented social engineering — the most likely real-world entry point. A "pass" based on the tester not reaching the objective provides false assurance that the binary framing can't distinguish from genuine security.
The value of penetration testing is best measured through metrics that track change over time — not through the outcome of a single engagement. These metrics, presented as trends across successive engagements, tell the story of a security programme that is (or isn't) improving.
| Metric | What It Measures | How to Track It | What Good Looks Like |
|---|---|---|---|
| Time to objective | How long the tester takes to achieve their primary objective — typically Domain Admin, access to sensitive data, or compromise of a critical system. | Record the time from initial access to objective achievement for each engagement. Plot as a trend across years. | Increasing. Year 1: 2 hours. Year 2: 2 days. Year 3: 4 days. Year 4: not achieved within the 10-day window. The environment is getting measurably harder to compromise. |
| Detection rate | The percentage of tester actions detected by the SOC, EDR, SIEM, or other monitoring systems. | The tester documents each significant action (initial access, credential harvesting, lateral movement, escalation, data access). The SOC reviews its logs post-engagement. The detection rate is the ratio of detected to total actions. | Increasing. Year 1: 0 of 7 (0%). Year 2: 3 of 8 (38%). Year 3: 6 of 9 (67%). Year 4: 8 of 9 (89%). The detection capability is maturing. |
| Mean time to detect | For the actions that were detected, how quickly the SOC identified them. | Record the timestamp of each tester action and the timestamp of each corresponding SOC alert or investigation. The difference is the detection latency. | Decreasing. Year 2: 14 hours average. Year 3: 3 hours. Year 4: 47 minutes. The SOC is detecting faster — reducing the attacker's operational window. |
| Recurring finding rate | The percentage of findings from the previous engagement that reappear in the current one. | Compare each engagement's findings to the previous report. Count the findings that recur. Calculate as a percentage of the previous engagement's total findings. | Decreasing. Year 1→2: 41% recur. Year 2→3: 18%. Year 3→4: 7%. Findings are being remediated permanently rather than temporarily. |
| Remediation velocity | How quickly the organisation remediates critical and high findings after report delivery. | Record the date the report is delivered and the date each critical/high finding is confirmed as remediated (by self-verification or retest). Calculate the mean. | Decreasing. Year 1: 94 days mean. Year 2: 42 days. Year 3: 18 days. Year 4: 11 days. The organisation is responding faster to identified risk. |
| Chain viability | Whether the attack chains from the previous engagement are still viable after remediation. | During each retest and subsequent engagement, the tester specifically validates whether previously identified chains are still exploitable. | Chains broken. If the three chains from the previous engagement are all confirmed as broken by the retest, the specific paths to critical assets no longer exist. New chains may emerge — but the previous ones have been permanently addressed. |
| Remediation success rate | The percentage of findings marked as "remediated" in the tracker that are confirmed as actually fixed during retesting. | Compare the remediation tracker status to the retest results. Calculate the ratio of confirmed fixes to claimed fixes. | Increasing. Year 1: 72%. Year 2: 85%. Year 3: 94%. The gap between "claimed fixed" and "confirmed fixed" is narrowing — implementation quality is improving. |
| Architectural vs configuration ratio | The proportion of findings that are architectural weaknesses versus configuration issues. | Classify each finding as architectural (requires design change) or configuration (requires setting change). Track the ratio across engagements. | Shifting toward configuration. Year 1: 40% architectural. Year 3: 15% architectural. The fundamental design is improving — remaining findings are configuration drift rather than systemic weakness. |
Any individual metric from a single engagement is a data point. It becomes meaningful when it's part of a trend. A detection rate of 44% means nothing in isolation — is that good? Bad? Improving? Deteriorating? But a detection rate that was 0% two years ago, 44% last year, and 67% this year tells a clear story: the detection capability is maturing, the investment in SIEM and SOC is producing returns, and the trajectory is positive.
Building the longitudinal view requires consistency: consistent metrics across engagements, consistent methodology (or documented methodology changes), and consistent reporting formats that allow comparison. This doesn't mean using the same provider forever — but it does mean ensuring that whoever conducts the test records the metrics the programme needs to track its trajectory.
This dashboard doesn't ask "did we pass?" It answers a more useful question: "Is the security programme working?" Every metric shows improvement. The trajectory is positive across all dimensions. The board can see that the investment is producing measurable returns — and the CISO has the evidence to justify continued funding.
Not every form of value from a pen test is quantifiable. Some of the most important returns are qualitative — and they're worth acknowledging because they contribute to the organisation's security posture in ways that metrics don't capture.
The metrics above measure the organisation's security improvement. But the organisation should also assess whether the testing provider is delivering value — and whether the engagement quality justifies the investment.
| Quality Indicator | High-Value Provider | Low-Value Provider |
|---|---|---|
| Findings depth | Findings include specific evidence, reproduction steps, business impact analysis, and remediation guidance tailored to the organisation's environment. | Generic findings copied from a template with no environmental context. Remediation guidance is "apply the latest patches" regardless of the specific vulnerability. |
| Attack narrative | A clear narrative explaining the attack chain — how each finding connects, which combinations produced escalation, and where the chain could be broken most cheaply. | A list of findings sorted by CVSS score with no narrative connecting them. The reader can't see the chain — only isolated issues. |
| Honest limitations | The report clearly states what was and wasn't tested, time constraints, scope limitations, inconclusive results, and residual risk from untested areas. | No limitations mentioned. The reader assumes comprehensive coverage that didn't exist. |
| Effective controls | The report acknowledges what worked — the EDR that caught the payload, the MFA that prevented credential abuse, the SOC that detected lateral movement. | Only failures reported. No acknowledgement of working controls. The reader gets a skewed view that ignores the organisation's effective defences. |
| Year-on-year comparison | The report includes a comparison section showing how metrics have changed since the previous engagement — recurring findings, time to objective, detection rate. | Each report stands alone with no connection to previous engagements. No longitudinal tracking. No way to demonstrate improvement. |
A pen test doesn't pass or fail. It produces evidence — evidence that, when tracked across engagements, demonstrates whether the security programme is improving. Time to objective increasing. Detection rate rising. Recurring findings declining. Remediation velocity accelerating. Chains broken. Architectural findings decreasing. Each metric tells part of the story. Together, they tell the whole story: the organisation is getting measurably harder to compromise.
The pass/fail question is comforting because it offers certainty. The metrics framework is more demanding because it requires continuous measurement, longitudinal tracking, and honest assessment of progress. But the metrics framework is also more useful — because it answers the question the board actually needs answered: not "are we secure?" (a question no pen test can answer definitively) but "is the money we're spending on security producing measurable risk reduction?" The metrics say yes. Or they say where to invest next.
The most valuable pen test is not the one that produces a clean report. It's the one that changes something — a configuration, a detection rule, an architecture, a design standard, a budget decision. Value is measured in what improves afterwards, not in what the report contains.
Our engagements are designed to produce the metrics your board needs: time to objective, detection rates, recurring finding trends, and remediation validation — tracked across every engagement to demonstrate that the security investment is producing measurable, compounding returns.