Why Penetration Testing Should Challenge Assumptions Rather Than Confirm Comfort

The Comfort Trap

When the pen test is designed to produce the answer you want.

Nobody commissions a pen test hoping for a 60-page report documenting a path from a phishing email to the CEO's mailbox. The CISO doesn't want to explain to the board that the tester achieved Domain Admin in two hours. The IT director doesn't want their team's work scrutinised and found wanting. The CFO doesn't want to hear that the remediation will cost £200,000.

The incentive structure pushes toward comfort. A clean report is easier to present. A reassuring outcome avoids difficult conversations. A low finding count confirms that the money spent on security tools and team is working. And the easiest way to produce a comfortable result is to constrain the test — knowingly or unknowingly — until the scope, methodology, and provider are configured to find as little as possible.

This doesn't require deliberate dishonesty. It happens incrementally: the cloud environment is excluded because "it's out of scope for this engagement." Social engineering is removed because "we tested that two years ago." The testing window is shortened from ten days to three because of budget pressure. The provider is selected on price rather than depth. Each decision is individually reasonable. Collectively, they produce a test that's optimised for reassurance rather than truth.

The Assumptions That Need Testing

The beliefs organisations hold that pen testing should challenge.

Every organisation operates on security assumptions — beliefs about the effectiveness of its controls that are rarely tested against adversarial techniques. These assumptions feel true because nothing has contradicted them. But "nothing has gone wrong" is not evidence that the controls work — it may simply mean that nobody has tried.

The Assumption	Why It Feels True	What the Pen Test Might Reveal
"We're fully patched"	The vulnerability scanner shows green. The patch management dashboard reports 98% compliance. The WSUS server is configured and running.	The scanner covers Windows patches but misses the Java application running on a non-standard port. The 2% non-compliance includes the domain controller that hasn't restarted since the patch was applied. The third-party application that runs the finance system hasn't been updated in 18 months because the vendor doesn't support the latest version.
"Our EDR will catch it"	The EDR vendor demonstrated impressive detection rates during the proof of concept. The product is deployed on every endpoint. The management console shows all agents reporting.	The tester crafts a custom encoded payload that bypasses the EDR's static analysis. The EDR catches the default Meterpreter binary but misses the custom shellcode loader. The product detects known techniques but the tester uses a variant that the signatures don't cover. The assumption that "EDR is deployed" equals "attacks are detected" is tested and found incomplete.
"Our network is segmented"	The network diagram shows VLANs. The firewall rules were designed during the infrastructure project two years ago. The architecture document describes a segmented network.	The tester discovers that the temporary "allow all" rule created during a migration was never removed. The WiFi network bypasses the segmentation entirely. The jump server in the management VLAN is accessible from the workstation VLAN. The segmentation exists on paper but not in practice.
"Our users are well-trained"	The organisation runs quarterly phishing simulations. The click rate has dropped from 24% to 6%. The security awareness training scores are above 90%.	The pen tester sends a tailored phishing email that doesn't match the template the simulation platform uses. The click rate is 31% — because the real phishing email doesn't look like the training examples. Users are trained to spot the simulation, not the real attack.
"Our cloud is secure"	The cloud team followed the vendor's security best practices. The configuration was reviewed at deployment. The cloud security posture management tool shows no critical findings.	The tester discovers an S3 bucket with public read access that contains database backups. The IAM role assigned to the Lambda function has full admin permissions "for development" that were never reduced for production. The CSPM tool checks configuration against CIS benchmarks but doesn't test whether the configuration can be exploited.
"MFA protects us"	MFA is deployed on the VPN, the email system, and the admin portals. The rollout is complete. All users are enrolled.	The tester bypasses MFA through a legacy protocol that doesn't support modern authentication — NTLM or IMAP. Or through an MFA fatigue attack where the user eventually approves the push notification. Or by compromising a session token that persists beyond the MFA challenge. MFA is a strong control — but the assumption that it's a complete defence is tested and found to have specific, exploitable exceptions.
"We'd know if we were compromised"	The SOC is operational 24/7. The SIEM collects logs from all critical systems. The EDR is deployed on every endpoint. Alerts are triaged within the SLA.	The tester operates inside the network for five days, harvests credentials, moves laterally to six systems, escalates to Domain Admin, and accesses the financial database — without triggering a single alert. The SOC is operational, but the detection rules don't cover the specific techniques the tester used. The logs are collected but nobody is looking for the indicators that matter.

How Comfort Gets Engineered

The decisions that constrain testing into producing safe results.

Comfortable pen test results are rarely the product of deliberate deception. They're the product of decisions that individually seem sensible but collectively remove the conditions under which real weaknesses would be discovered.

Decision	Reasonable Justification	What It Actually Does
Narrowing the scope	"We'll focus on the internal network this year. The cloud and applications can wait."	Excludes the attack surfaces most likely to contain undiscovered vulnerabilities — the ones that haven't been tested before.
Shortening the window	"We only have budget for three days. That should be enough for a basic assessment."	Limits the tester to the first-order findings — the easy wins that a quick scan would also reveal. The chains, the detection gaps, and the architectural weaknesses require time to discover.
Excluding social engineering	"We don't want to test our staff — that's a different exercise."	Removes the most likely real-world entry point. The majority of breaches begin with phishing. Excluding it means the test starts from a position that real attackers must earn.
Selecting on price	"All pen tests are basically the same. We'll go with the cheapest quote."	The cheapest provider typically delivers the shallowest assessment. Automated scanning presented as a pen test. Generic findings. No attack narrative. No chain analysis. The results are comfortable because the testing was superficial.
Testing from an unrealistic position	"Start the test from outside the network with no credentials. We want to see if anyone can break in."	An external-only test against a well-configured perimeter may find little — not because the organisation is secure, but because the test didn't simulate the most realistic scenarios. A phishing email, a compromised VPN credential, or a rogue insider all bypass the perimeter.

Structuring Tests That Challenge

How to design engagements that test what you're afraid of.

An engagement designed to challenge assumptions starts with a different question. Instead of asking "what should we test?" the question is "what do we believe about our security that we haven't verified?" The answer identifies the assumptions — and the assumptions define the scope.

Assumption to Challenge	Engagement Design
"Our EDR will catch a sophisticated attacker."	Include endpoint evasion testing in the scope. Ask the provider to attempt custom payload delivery and test whether the EDR detects it. If the default Meterpreter payload gets caught, the test should progress to encoded, encrypted, and custom payloads until the detection boundary is found.
"Our network segmentation is effective."	Start the tester on the workstation VLAN and set the objective as reaching a system in the server or management VLAN. If the segmentation works, the tester won't cross the boundary. If it doesn't, the finding reveals where the implementation failed.
"Our SOC would detect an attacker within hours."	Commission a red team engagement where the SOC is not informed. Measure detection: how many tester actions were detected? How quickly? At what point in the kill chain? The answer validates or invalidates the assumption with evidence.
"Our cloud configuration is secure."	Commission a cloud-specific assessment: IAM policy review, storage permissions, network controls, serverless function security. Test the cloud environment with the same rigour applied to the internal network.
"A phishing attack wouldn't work here."	Include social engineering in the scope. Send tailored phishing emails — not the same templates the awareness platform uses. Measure the actual click rate against a realistic attack, not a simulation.

The Value of Discomfort

Why the most useful pen test is the one nobody wanted to read.

A pen test report that produces discomfort — that challenges assumptions, reveals weaknesses the organisation believed it had addressed, and demonstrates that the security investment isn't producing the expected return — is doing its job. The discomfort is the signal that something needs to change.

A comfortable report doesn't mean the organisation is secure. It may mean the test wasn't thorough enough to find the problems, the scope was too narrow to reach them, or the provider wasn't skilled enough to exploit them. The most dangerous pen test result isn't a 60-page report full of critical findings. It's a clean report from a constrained test that gives the organisation false confidence — confidence that prevents the investment, the architectural changes, and the cultural shift that the real risk profile demands.

The organisations that improve fastest are the ones that commission tests designed to challenge their assumptions, read the results honestly, and act on what they find — even when the findings are uncomfortable. The organisations that stay vulnerable are the ones that optimise for comfort.

For Your Organisation

Choosing challenge over comfort.

Start with Your Assumptions

Before scoping the next pen test, list the security assumptions the organisation operates on: the EDR works, the network is segmented, MFA covers everything, the SOC detects attacks within hours. Then design the engagement to test those assumptions specifically. If the test doesn't challenge at least one assumption, the scope isn't ambitious enough.

Resist the Urge to Constrain

Every scope exclusion, every shortened window, every removed testing technique reduces the test's ability to find real problems. Before excluding something, ask: "Am I excluding this because it's genuinely inappropriate, or because I'm afraid of what the test might find?" If the answer is the latter, that's exactly what should be tested.

Select Providers on Depth, Not Price

The cheapest pen test produces the most comfortable results — because superficial testing finds superficial issues. Select providers based on methodology, tester experience, reporting quality, and the depth of their previous work. Ask for sample reports. Look for attack narratives, chain analysis, and honest limitation statements.

Welcome Uncomfortable Results

A report full of critical findings isn't a failure — it's an opportunity. It means the test was thorough enough to find real problems and the provider was honest enough to report them. The findings are fixable. The risk they represent — if left undiscovered — is not.

Test What Changed and What You Fear

Every engagement should include two categories: systems that changed since the last test (new cloud environment, acquired infrastructure, new applications) and assumptions that haven't been verified (EDR effectiveness, SOC detection, segmentation implementation). This ensures the test is always challenging something — never just confirming what's already known.

Summary

The bottom line.

The purpose of a penetration test is not to confirm that the organisation is secure. It's to find out whether it isn't — and specifically where, and specifically how, and specifically what to do about it. A test designed to confirm comfort will produce comfort. A test designed to challenge assumptions will produce truth. Only one of these improves the organisation's security.

Every organisation carries security assumptions that feel true but haven't been tested: the EDR catches sophisticated attacks, the network segmentation contains lateral movement, the SOC detects intrusions within hours, MFA covers every access path. These assumptions persist because nothing has contradicted them — not because they've been verified. The pen test is the verification mechanism. If it's not challenging those assumptions, it's not doing its job.

The most valuable pen test is the one that makes the organisation uncomfortable — because discomfort is the catalyst for change. The comfortable report gets filed. The uncomfortable report changes the architecture, the detection capability, the investment priorities, and the security culture. Comfort confirms. Challenge improves.

Testing That Challenges, Not Confirms

Penetration testing designed to test your assumptions, not validate your comfort.

We design engagements that target the assumptions your organisation operates on — the beliefs about EDR effectiveness, segmentation, detection, and access control that haven't been tested by an adversary. Because the pen test that challenges produces more security improvement than the one that reassures.

Discuss Assumption-Challenging Testing Read: Pen Testing and Incident Preparedness

All Posts Get in Touch