Penetration Testing

Remediation Validation and Retesting: Closing the Loop on Penetration Testing

> diff findings_v1.json findings_v2.json | grep 'remediated' | wc -l && echo 'verified, not assumed'_

Peter Bassill 11 November 2025 15 min read
remediation retesting validation continuous improvement metrics security programme

"Remediated" in the tracker. Vulnerable in reality.

The pen test report identified 34 findings. The IT team spent eight weeks working through the remediation roadmap. The GPO changes were applied. The service account passwords were rotated. The detection rules were deployed. The remediation tracker shows 28 of 34 findings marked as remediated. The CISO reports to the board: "82% of findings remediated. Programme on track."

Six months later, the next pen test begins. Within the first day, the tester captures credentials via LLMNR poisoning — the same technique from the previous engagement. Finding F-003: LLMNR enabled. Status in the tracker: remediated. Status in reality: the GPO was applied to the wrong Organisational Unit. Half the domain is still broadcasting LLMNR. The fix was implemented. It just doesn't work.

This isn't unusual. In our experience, approximately 15–25% of remediations that are marked as complete contain implementation errors that leave the vulnerability partially or fully unaddressed. The finding is closed in the tracker. The risk remains in the environment. The gap between "we fixed it" and "it's actually fixed" is the verification gap — and it's the most common reason organisations see recurring findings across successive pen test engagements.


The implementation errors that leave findings open.

Error Pattern Example How Often We See It
Wrong scope The GPO to disable LLMNR was linked to the "Servers" OU instead of the domain root. All workstations — the actual target — are still broadcasting. Common. GPO scoping errors account for a significant proportion of failed infrastructure remediations.
Partial implementation SMB signing was enforced on servers but not on workstations. The tester relays from a workstation — the fix only covered half the attack surface. Common. Particularly with changes that require both client and server configuration.
Exception undermines the rule The firewall rule blocking lateral movement has an exception for the IT team's subnet. The tester compromises an IT workstation first and bypasses the rule entirely. Frequent. Exceptions created for operational convenience often recreate the vulnerability for the exact users who have the most access.
Old credentials cached The service account password was rotated, but the old password is cached in LSASS on three servers where the account was previously used interactively. The tester extracts the cached credential. Occasional. Credential caching persists until the cached entries are cleared or the system is restarted.
Fix applied, then reverted Network segmentation was implemented in March. A connectivity issue in April led the network team to add a temporary "allow all" rule between VLANs. The temporary rule is still in place eight months later. Frequent. Temporary changes made under operational pressure are rarely removed. The remediation tracker still shows "remediated."
Detection rule too narrow The SOC deployed a Kerberoasting detection rule for RC4 ticket requests. The tester requests AES tickets — still Kerberoastable, different encryption type. The rule doesn't fire. Common. Detection rules written for the exact technique used in the previous engagement often miss variants.

Every one of these errors is invisible in the remediation tracker. The tracker shows the finding as closed. The engineer believes the fix is in place. The CISO reports progress to the board. And the vulnerability persists — until someone tests whether the fix actually works.


Two approaches — different levels of assurance.

There are two ways to validate remediation: the engineering team verifies their own fixes using the steps provided in the report, or an independent tester validates the fixes by attempting to re-exploit the vulnerabilities. Both have value. They serve different purposes.

Approach How It Works Assurance Level Best For
Self-verification The engineer implements the fix and then runs the verification step from the pen test report. "After disabling LLMNR, run Responder on the VLAN for 15 minutes — no hashes should be captured." The engineer performs the test and records the result. Moderate. Confirms the specific fix was applied correctly. Does not test for variants, workarounds, or unintended consequences. Subject to the engineer's interpretation of the verification step. Quick wins and standard remediations where the verification step is straightforward. Appropriate for low and medium findings. First-line validation for all findings.
Independent retesting An independent tester — ideally from the original pen test provider — attempts to re-exploit the remediated findings using the same techniques and any variants. They validate that the fix works, that it hasn't introduced new issues, and that the attack chain is genuinely broken. High. Independent validation that the fix works against the actual technique and its variants. Tests the fix in the context of the full attack chain, not just the individual finding. Identifies implementation errors that self-verification misses. Critical and high findings. Chain findings where the break point must be validated. Findings where the remediation is complex or the risk of implementation error is high. Before reporting remediation progress to the board or regulator.

The recommended approach is both: self-verification for every finding as the first-line check, followed by independent retesting for critical and high findings and all chain break points. Self-verification catches the obvious implementation errors immediately. Independent retesting catches the subtle ones — the wrong OU, the cached credential, the detection rule variant — that only surface when an adversary's perspective is applied.


What a targeted retest looks like in practice.

A retest is not a full pen test. It's a focused engagement — typically one to three days — where the tester validates that specific remediations have been implemented correctly and that the attack chains from the previous engagement are broken.

Element Full Pen Test Targeted Retest
Objective Discover new vulnerabilities. Map attack chains. Test the full scope of the environment. Validate that specific remediations from the previous engagement are effective. Confirm attack chains are broken.
Scope Broad — the full environment or a defined segment. Narrow — limited to the findings from the previous report, particularly critical and high findings and chain break points.
Duration 5–15 days depending on environment size. 1–3 days. The tester already knows the environment and the findings — they're validating specific fixes, not discovering new issues.
Deliverable Full report with executive summary, attack narrative, findings, and remediation roadmap. Retest report: a table showing each remediated finding, its validation status (fixed, partially fixed, not fixed), evidence, and notes on any implementation issues discovered.
Cost Full engagement fee. Typically 15–25% of the original engagement cost. Significantly less because the scope is narrower, the tester is familiar with the environment, and discovery isn't required.
Retest Report — Example Finding Validation
FINDING: F-003 — LLMNR/NBT-NS Broadcast Protocols Enabled
Original severity: Critical (Real World Risk Score)
Remediation status in tracker: Remediated

Retest result: PARTIALLY FIXED

Findings:
LLMNR: Disabled on Servers OU (confirmed — no responses from servers).
LLMNR: STILL ENABLED on Workstations OU. Responder captured 6 hashes
in 12 minutes from workstation VLAN.
NBT-NS: Still enabled domain-wide. DHCP option not configured.

Root cause:
GPO 'Disable LLMNR' linked to 'OU=Servers,DC=acme,DC=local'
instead of domain root. Workstations OU not covered.
NBT-NS remediation was not attempted (DHCP option 001 not set).

Impact:
Attack chain from original engagement is STILL VIABLE.
Entry point via workstation LLMNR poisoning remains open.

Recommendation:
Move GPO link from Servers OU to domain root.
Configure DHCP option 001 to disable NBT-NS domain-wide.
Estimated effort: 15 minutes. No change window required.

This is the kind of finding that's invisible without a retest. The tracker shows "remediated." The GPO exists. The engineer implemented it in good faith. But the scoping error means the fix protects half the environment. The retest identifies the gap, provides the specific correction, and the 15-minute fix completes the remediation. Without the retest, the finding recurs in the next annual engagement — six months and a significant amount of risk later.


The metrics that prove security is actually improving.

The board doesn't want to hear about GPO scoping errors. They want to know whether the security programme is working — whether the money spent on testing, remediation, and tooling is producing measurable improvement. The retest provides the data to answer that question.

Metric What It Measures What Good Looks Like
Remediation success rate The percentage of remediated findings confirmed as fully fixed during retesting. Increasing over time. Year 1: 72% of remediations verified as effective. Year 2: 89%. The gap between "marked remediated" and "confirmed fixed" is narrowing — the engineering team's implementation quality is improving.
Recurring finding rate The percentage of findings from the previous engagement that appear again in the current engagement. Decreasing over time. Year 1: 14 of 34 findings recur (41%). Year 2: 3 of 28 recur (11%). Findings are being fixed permanently, not temporarily.
Time to objective How long the tester takes to achieve their primary objective (e.g. Domain Admin, access to sensitive data). Increasing over time. Year 1: DA in 2 hours 15 minutes. Year 2: DA in 2 days 4 hours. Year 3: DA not achieved within the 10-day window. The environment is getting harder to compromise.
Detection rate The percentage of tester actions detected by the SOC, EDR, and monitoring systems. Increasing over time. Year 1: 0 of 7 actions detected (0%). Year 2: 4 of 9 detected (44%). Year 3: 7 of 8 detected (88%). The detection capability is maturing.
Mean time to remediate The average number of days between report delivery and confirmed remediation of critical and high findings. Decreasing over time. Year 1: 94 days. Year 2: 37 days. Year 3: 14 days. The organisation is responding faster to identified risk.
Chain viability Whether the attack chains from the previous engagement are still viable after remediation. Chains broken. If the retest confirms that the three chain break points identified in the previous report are all effective, the specific path to the crown jewels no longer exists. This is the most meaningful single metric.

These metrics, presented as trends across two or three years of engagements, tell a story the board can understand: security is improving, the investment is producing results, and the risk trajectory is downward. No single engagement can tell this story. The longitudinal view — built from consistent testing, remediation, retesting, and tracking — demonstrates the return on the organisation's security investment in terms that justify continued funding.


From annual event to ongoing programme.

The testing-remediation-validation cycle transforms penetration testing from a compliance checkbox into a continuous improvement programme. Each iteration builds on the previous one, and the cumulative effect compounds over time.

Phase When What Happens Deliverable
1. Test Annual (or more frequent) Full penetration test against the agreed scope. Findings identified, attack chains mapped, remediation roadmap produced. Full pen test report with executive summary, attack narrative, findings, and roadmap.
2. Remediate Weeks 1–8 after report delivery Engineering team works through the roadmap. Quick wins implemented immediately. Standard remediations scheduled. Projects scoped and funded. Each finding self-verified using the report's verification steps. Updated remediation tracker with status, date, evidence, and sign-off for each finding.
3. Validate Weeks 8–12 after report delivery Independent retest of critical and high findings and all chain break points. Implementation errors identified and corrected. Chains confirmed as broken or still viable. Retest report with validation status for each finding. Remediation tracker updated with confirmed status.
4. Report Quarterly Remediation progress reported to the board. Metrics presented as trends. Improvement demonstrated. Remaining risk communicated honestly. Board report showing remediation success rate, recurring findings, detection rate, and time to objective — as trends.
5. Repeat Annual cycle restarts Next engagement builds on the previous: previous report and tracker provided to the new tester. Remediated findings validated. New scope areas explored. The cycle compounds. New report with comparison section showing improvement since the previous engagement.

Building the loop into your security programme.

Budget for Retesting When You Commission the Pen Test
Include a retest in the original scope and budget. A 10-day pen test followed by a 2-day retest eight weeks later costs approximately 20% more — and delivers significantly more assurance than a 10-day test with no validation. The retest is where remediation becomes confirmed improvement.
Self-Verify Every Finding Before the Retest
Use the verification steps in the report to check every remediation before the independent retest. This catches the obvious errors — the GPO that wasn't linked, the rule that wasn't enabled — so the retest can focus on the subtle ones. Self-verification is free. Implementation errors discovered during a paid retest are expensive.
Track Metrics Across Engagements
Maintain a dashboard that tracks remediation success rate, recurring findings, time to objective, detection rate, and mean time to remediate — across every engagement. Present this dashboard to the board quarterly. Two years of improving metrics is the strongest argument for continued security investment.
Provide Every Report to the Next Tester
When commissioning the next pen test, provide the previous report, the remediation tracker, and the retest results. The new tester validates fixes, skips known ground, and focuses on new attack surface. Each engagement builds on the last — compounding value instead of repeating work.
Use the Same Provider for the Retest
The provider who found the vulnerabilities is best placed to validate the fixes — they understand the environment, the attack chains, and the specific techniques used. The retest is most efficient and most thorough when performed by the same team that performed the original engagement.

The bottom line.

A finding marked "remediated" in a tracker is a claim. A finding confirmed as fixed by an independent retest is a fact. The gap between the two — the 15–25% of remediations that contain implementation errors — is the verification gap that retesting closes. Without validation, the organisation reports progress to the board based on tracker status. With validation, the organisation reports progress based on confirmed results.

Retesting transforms the pen test from a one-time assessment into a continuous improvement cycle. Each iteration — test, remediate, validate, report, repeat — builds on the previous one. The metrics compound: recurring findings decrease, time to objective increases, detection rates improve, and remediation speed accelerates. After two or three cycles, the organisation has a longitudinal record that demonstrates measurable security improvement — the strongest evidence of programme effectiveness for the board, the auditor, the insurer, and the regulator.

The pen test finds the problems. Remediation addresses them. Retesting proves they're fixed. Without all three, the loop is open — and an open loop is a loop that doesn't improve anything.


Pen test, remediate, retest, repeat — the cycle that produces measurable improvement.

Our retesting engagements validate every critical and high remediation, confirm attack chains are broken, and provide the longitudinal metrics that demonstrate your security programme is working — because a finding that's verified as fixed is worth more than ten that are assumed to be.