> diff year1.json year5.json | grep '+' | wc -l && echo 'compounding improvement'_
Most organisations commission their first penetration test because they have to — a compliance requirement, an insurance condition, a client contract, or a board directive after a competitor's breach. The test happens. The report arrives. Some findings are fixed. The report is filed. Twelve months later, the cycle repeats.
This is penetration testing as an event. It produces a snapshot — a point-in-time assessment that captures the organisation's security posture during the testing window. The snapshot has value: it identifies vulnerabilities, demonstrates compliance, and provides the evidence that auditors and insurers require. But each snapshot is independent. The tester starts from scratch. Findings from the previous year may recur. Progress is difficult to measure. The board sees a new set of numbers each year and has no way to determine whether the security programme is improving.
A penetration testing programme is different. It's a multi-year strategy where each engagement builds on the previous one. The scope evolves. The metrics compound. The provider understands the environment's history. Remediation is tracked across engagements, not just within them. The board sees a trajectory — not a snapshot — and can evaluate whether the security investment is producing returns.
The difference isn't cost. A programme costs roughly the same as annual tests. The difference is intent: the organisation treats penetration testing as an input to a continuous improvement cycle, not as a compliance obligation that gets discharged once a year.
| Year | Focus | What Changes | Board Sees |
|---|---|---|---|
| Year 1: Baseline | Internal infrastructure pen test. Establish the current state. Identify the most critical attack paths. Build the remediation tracker. | First-time engagement. Full discovery. The tester maps the environment, identifies the chains, and produces the initial remediation roadmap. The organisation learns what its actual risk looks like — often for the first time. | "Here's where we are. The tester reached Domain Admin in 2 hours. Here are the three quick wins that break the chain, and here's the 6-month programme to address the systemic issues." |
| Year 2: Validate and Expand | Retest previous findings. Expand scope to include web applications or cloud. Measure improvement against baseline. | The tester receives last year's report and tracker. They validate remediations, identify recurring findings, and test new scope areas. The first longitudinal comparison is produced — the beginning of the trend. | "Recurring findings dropped from 34 to 6. Time to DA increased from 2 hours to 2 days. Detection rate improved from 0% to 44%. Here's the updated roadmap for Year 3." |
| Year 3: Deepen and Diversify | Add social engineering or physical access testing. Introduce red team exercises for the first time. Test detection and response, not just prevention. | The programme moves beyond vulnerability finding into detection testing. The SOC is assessed alongside the infrastructure. Red team exercises simulate realistic adversary behaviour over extended periods — testing the organisation's ability to detect and respond, not just resist. | "The SOC detected 7 of 8 tester actions. Mean detection time was 47 minutes. DA was not achieved within the 10-day window. Two areas need investment: cloud security and phishing resilience." |
| Year 4: Optimise | Purple team sessions where testers and defenders collaborate. Targeted testing of new infrastructure (cloud migration, M&A integration). Scenario-based testing (ransomware simulation, insider threat). | The programme is mature enough to move from adversarial testing to collaborative improvement. Purple team sessions let the SOC watch the tester's techniques in real time and tune detection rules live. Scenario-based testing validates the organisation's resilience against specific threats that matter to the business. | "Purple team session resulted in 14 new detection rules. Ransomware simulation showed containment within 23 minutes. Cloud assessment identified 3 critical misconfigurations in the new AWS environment." |
| Year 5: Continuous Assurance | Rotating scope across the full estate. Continuous vulnerability management integrated with periodic deep testing. Bug bounty programme for external perspective. Testing embedded in the SDLC. | The programme is self-sustaining. Testing is continuous, not annual. Every major change triggers a targeted assessment. The bug bounty provides ongoing external validation. Pen testing is embedded in the development lifecycle — new applications are tested before release. | "No tester has achieved DA in 18 months. Detection rate is 91%. Mean remediation time is 11 days. The programme is producing compounding returns on a stable budget." |
This isn't a rigid framework — every organisation's journey is different. But the pattern is consistent: baseline, validate, expand, deepen, optimise. Each year builds on the previous one. The scope expands. The techniques become more sophisticated. The organisation's defensive capability matures in response.
Testing the same scope with the same methodology every year produces diminishing returns. The first test finds the easy wins. The second validates the fixes. By the third, the tester is re-covering familiar ground. A mature programme deliberately rotates and expands scope to maintain the value of each engagement.
| Scope Area | When to Introduce | What It Tests |
|---|---|---|
| Internal infrastructure | Year 1 onwards — the foundation of every programme. | Active Directory, network services, workstation configuration, credential hygiene, lateral movement paths, privilege escalation, and detection capability. |
| External infrastructure | Year 1 or 2 — often required by compliance frameworks. | Internet-facing services, VPN endpoints, web servers, email gateways, cloud-hosted resources. The attack surface visible to any adversary. |
| Web applications | Year 2 — once infrastructure fundamentals are addressed. | OWASP Top 10, authentication and session management, authorisation, input validation, API security. Application-layer vulnerabilities that infrastructure testing doesn't cover. |
| Cloud environments | Year 2–3, or whenever cloud adoption reaches meaningful scale. | IAM configuration, storage permissions, network controls, serverless function security, container security, multi-tenancy isolation. |
| Social engineering | Year 3 — once technical controls are maturing. | Phishing campaigns, vishing (telephone), physical access, USB drops, pretexting. Tests the human layer that technical controls can't fully address. |
| Red team exercises | Year 3–4 — once the SOC and detection capability are established. | Realistic adversary simulation over days or weeks. Tests the full kill chain: initial access, persistence, lateral movement, data exfiltration, and — critically — whether the SOC detects and responds. |
| Purple team sessions | Year 4 — once the programme is mature enough for collaborative improvement. | Testers and defenders work together. The tester runs techniques while the SOC watches, detects, and tunes. Produces immediate detection improvements rather than waiting for the report. |
| Scenario-based testing | Year 4–5 — tailored to the organisation's specific threat model. | Ransomware simulation, insider threat, supply chain compromise, targeted APT simulation. Tests resilience against the threats that the board identifies as most significant to the business. |
Organisations debate whether to use the same pen test provider year after year or rotate providers for fresh perspectives. Both approaches have genuine advantages — and a mature programme can capture the benefits of each.
| Approach | Advantages | Disadvantages |
|---|---|---|
| Same provider, multi-year | Deep environmental knowledge. Consistent methodology and reporting format. Longitudinal comparison built in. The tester knows the history — they can push deeper each year instead of re-learning the environment. Retesting is more efficient. | Potential for familiarity bias — the tester may follow the same paths and miss areas they've unconsciously deprioritised. The organisation may become comfortable with a specific approach. |
| Rotating providers | Fresh perspective. Different methodologies, tools, and techniques. Findings that the previous provider missed may be discovered. Prevents complacency. | The new provider starts from scratch — the first engagement produces less depth. Reporting formats differ, making longitudinal comparison harder. Environmental context must be rebuilt. |
| Hybrid approach | Primary provider for 2–3 years, then rotate. Or: primary provider for infrastructure, different provider for web applications or red teaming. Each provider brings depth in their speciality. Fresh perspective arrives periodically without losing continuity. | More complex to manage. Requires clear communication between providers when sharing previous reports and remediation trackers. |
The hybrid approach is typically the most effective for a mature programme. A primary provider delivers continuity, longitudinal metrics, and deepening environmental knowledge over a two-to-three year cycle. A secondary provider — or periodic rotation — introduces fresh perspective, different methodologies, and the constructive challenge that comes from a new set of eyes. The key is ensuring that every provider receives the full history: previous reports, remediation trackers, and retest results.
Penetration testing, red teaming, and purple teaming are not interchangeable — they serve different purposes and are appropriate at different stages of the programme's maturity.
A mature programme uses all three, at different frequencies: annual pen testing for ongoing vulnerability identification, periodic red teaming (every 12–18 months) for full-spectrum resilience testing, and quarterly purple team sessions for continuous detection improvement. Each type feeds into the others: pen test findings inform red team scenarios, red team results identify detection gaps, and purple team sessions close those gaps.
The metrics that matter evolve as the programme matures. Early-stage metrics focus on finding and fixing. Mid-stage metrics focus on detection and response. Late-stage metrics focus on resilience and efficiency.
| Maturity Stage | Key Metrics | What They Demonstrate |
|---|---|---|
| Early (Year 1–2) | Finding count by severity. Recurring findings from previous engagement. Mean time to remediate critical and high findings. Remediation success rate (self-verified and retest-confirmed). | The organisation is finding and fixing vulnerabilities. Remediation is tracked and verified. The baseline is established. |
| Mid (Year 3–4) | Time to tester objective (increasing). SOC detection rate (percentage of tester actions detected). Mean time to detect (decreasing). Chain break effectiveness (are previous chains still viable?). | The organisation is detecting and responding to attacks. The environment is getting harder to compromise. Detection gaps are being closed systematically. |
| Mature (Year 5+) | Red team objective achievement rate (should decrease). Time to containment during red team exercises. Purple team detection coverage (percentage of ATT&CK techniques detected). Cost per finding remediated (efficiency metric). Resilience score (composite of detection, prevention, and response). | The organisation is resilient. The security programme is efficient. Investment is producing compounding returns. The board can quantify the return on security investment. |
Not every organisation that commissions annual pen tests has a programme. Some have been testing for years without progressing. The signs are recognisable — and addressing them is the first step toward programme maturity.
| Immature Pattern | What It Looks Like | The Programme Response |
|---|---|---|
| Same findings every year | LLMNR, SMB signing, weak passwords, and missing patches appear in every report for three consecutive years. The tracker shows them remediated. The next test finds them again. | Investigate why remediations aren't persisting. Implement change controls that prevent recurrence. Commission retests between annual engagements to catch regression early. |
| No tester continuity | Different provider every year. Each starts from scratch. No longitudinal comparison. No trend data. The board sees new numbers each year with no context. | Adopt the hybrid model: primary provider for 2–3 years for continuity, periodic rotation for fresh perspective. Ensure every provider receives previous reports and trackers. |
| Scope never changes | Internal infrastructure pen test every year, same scope, same methodology. Web applications, cloud, social engineering, and detection capability are never tested. | Develop a multi-year scope plan that rotates across the full estate. Year 1: internal. Year 2: web apps + retest. Year 3: cloud + social engineering. Year 4: red team. |
| No remediation tracking | The report arrives, some findings are fixed, the report is filed. There's no tracker. Nobody knows which findings from two years ago were resolved. | Implement a remediation tracker from the first engagement. Maintain it across all engagements. Provide it to every new tester. Track metrics quarterly. |
| Testing is compliance-driven only | The pen test is commissioned because PCI DSS requires it or the insurer demands it. The test scope is the minimum to satisfy the requirement. The report is submitted and filed. | Reframe pen testing as risk management, not compliance. The compliance requirement is the floor. The programme should exceed it — testing beyond the mandated scope to address the organisation's actual risk profile. |
A penetration test is a snapshot. A penetration testing programme is a time-lapse — each frame building on the last, each year's engagement advancing the organisation's security posture further than the previous one. The snapshot tells you where you are. The time-lapse tells you where you've been, where you're going, and whether the investment is working.
The maturity journey follows a predictable arc: baseline the current state, validate remediation and expand scope, introduce detection testing and adversary simulation, move to collaborative improvement and scenario-based exercises, and embed testing into continuous operations. Each stage produces compounding returns: recurring findings decrease, detection rates improve, time to compromise increases, and the organisation's resilience against real-world threats becomes measurable and demonstrable.
The difference between an organisation that tests annually and an organisation that runs a programme isn't the budget — it's the intent. The programme treats every engagement as a chapter in a continuing story. The reports are read together, not in isolation. The metrics are tracked as trends, not single points. And the result, after three to five years, is an organisation that can demonstrate — with evidence, to the board, the auditor, the insurer, and the regulator — that its security programme is working.
We work with organisations to build testing programmes that evolve over years — from initial baseline through red teaming and purple team sessions — delivering the longitudinal metrics and compounding improvement that demonstrate security investment is producing returns.