> test attack | tee /dev/soc && diff expected.alerts actual.alerts_
A mid-sized insurance company invests heavily in defensive security. It deploys CrowdStrike Falcon across every endpoint. It stands up a 24/7 managed SOC with a Tier 1/2/3 analyst structure. It implements Microsoft Sentinel as its SIEM, ingesting logs from Active Directory, M365, firewalls, VPN, and DNS. It writes 140 custom detection rules. It passes its Cyber Essentials Plus assessment. The CISO tells the board: "We can see everything."
Six months later, a penetration tester achieves Domain Admin in three hours and forty minutes. They captured credentials via LLMNR poisoning, Kerberoasted a service account, escalated through a misconfigured backup operator role, extracted the NTDS.dit from the domain controller, and accessed the finance share containing policyholder data for 340,000 customers. They moved laterally across four servers and escalated privileges three times.
The SOC detected none of it. Not the LLMNR poisoning. Not the Kerberoasting. Not the lateral movement. Not the NTDS.dit extraction. Not the file share access. CrowdStrike was running on every endpoint — and didn't alert because the tester used legitimate Windows tools and protocols that EDR is not designed to flag by default. The SIEM had 140 rules — none of which matched the specific attack patterns the tester employed.
The SOC wasn't incompetent. It was untested. The detection rules were written against a threat model that had never been validated by a real adversary. The SIEM was collecting the right logs but asking the wrong questions. The EDR was functioning perfectly — within the boundaries of what EDR is designed to detect. Nobody had ever tested whether the defensive stack, as a whole, could detect and respond to the attacks that actually work against this specific environment.
Penetration testing asks: "Can an attacker compromise this environment?" Defensive security asks: "Can we detect and respond when they try?" These are different questions. The answer to one does not imply the answer to the other. An organisation needs both — and each makes the other more effective.
The misconception that penetration testing and defensive security are interchangeable — that you can invest in one instead of the other — stems from a failure to understand what each is measuring. They operate on different planes of the same problem.
| Penetration Testing | Defensive Security (SOC / EDR / SIEM) | |
|---|---|---|
| Primary question | Can an attacker breach this environment, escalate privileges, and access sensitive data? | Can we detect, investigate, and respond to an attack in progress before it causes damage? |
| What it finds | Exploitable vulnerabilities, misconfigurations, weak credentials, missing controls, viable attack paths, chainable findings. | Alert gaps, log blind spots, detection rule failures, response time deficiencies, analyst knowledge gaps, playbook weaknesses. |
| When it operates | Point-in-time. A defined engagement with a start date, end date, and scope. A snapshot of the environment's vulnerability posture at a specific moment. | Continuous. 24/7/365 monitoring. Ongoing detection and response. The persistent defensive capability that operates between pen tests. |
| What it assumes | The attacker is skilled, motivated, and patient. They will find the weakest point and exploit it. The test assumes defences may fail. | Attacks will occur. The defensive stack must detect them quickly enough and respond effectively enough to limit damage. The SOC assumes attacks will get through. |
| What it doesn't measure | Whether the defensive stack detected the attack. Whether the SOC responded appropriately. Whether the organisation's incident response process works. (Unless explicitly scoped as a detection assessment.) | Whether the attack would have succeeded in the first place. Whether the vulnerability exists. Whether the misconfiguration is exploitable. Whether the attack path is viable. |
| Output | A report of confirmed vulnerabilities, demonstrated attack paths, and remediation recommendations ranked by risk. | Ongoing alerts, investigation reports, incident response actions, threat intelligence integration, and continuous posture improvement. |
| Analogy | A burglar testing whether your locks, windows, and alarm system can be bypassed. | The alarm company monitoring your house 24/7 and dispatching a response team when the sensors trigger. |
The burglar test is useless if you never install an alarm. The alarm is useless if you never test whether the burglar can get past it. Security requires both — and the gap between them is where real-world breaches live.
When a penetration test is conducted alongside — not instead of — a functioning SOC, it produces a metric that neither can generate alone: the detection gap. This is the delta between what the attacker did and what the SOC saw. It's the most actionable finding in any engagement that includes detection assessment, and it's the metric that turns a pen test from a vulnerability report into a detection improvement programme.
Ten attacker actions. Zero detections. A 0% detection rate against the exact attack techniques that produce Domain Admin in the majority of internal penetration tests. The SOC had 140 rules — but none of them covered LLMNR poisoning, Kerberoasting, PsExec lateral movement, NTDS.dit extraction, or anomalous file share access. The rules detected threats the organisation had imagined. The pen test revealed the threats that actually materialised.
After the engagement, the SOC team used the detection gap analysis to write seven new detection rules — each mapped to a specific attacker action that had succeeded undetected. The next pen test, six months later, achieved the same initial compromise — but the SOC detected the Kerberoasting at minute 40, the lateral movement at minute 55, and triggered the incident response playbook at minute 62. Domain Admin was still achievable, but containment began before the tester reached the finance share. That improvement was only possible because the pen test measured the detection gap and gave the SOC team the specific intelligence they needed to close it.
A SOC that has never faced a realistic adversary is untested by definition. It may have passed tabletop exercises, responded to commodity malware alerts, and tuned rules against known indicators of compromise. But none of that proves it can detect a skilled attacker using legitimate tools, valid credentials, and native protocols to move through the environment.
| What Pen Testing Reveals | Why the SOC Can't Find This Alone |
|---|---|
| Detection rules that don't fire — rules that look correct on paper but fail to trigger against real attack patterns because of parsing errors, threshold misconfiguration, or log source gaps. | A rule that's never been triggered can't be validated by normal operations. Only a real attack — or a simulated one — produces the telemetry that proves whether the rule works. The SOC has no way to test its own rules without adversarial input. |
| Log blind spots — categories of activity that generate no telemetry because the relevant log source isn't ingested, isn't parsed, or isn't enabled. Common gaps: PowerShell script block logging, Kerberos service ticket requests (Event ID 4769), DNS query logging, SMB access auditing. | The SOC can't detect what it can't see. But it often doesn't know what it can't see until an attacker operates in the blind spot. The pen test maps the attacker's path and cross-references it against the SIEM's data sources — revealing every point where telemetry was absent. |
| Alert fatigue and prioritisation failure — the attack generated some alerts, but they were buried in noise, deprioritised by triage logic, or dismissed as false positives by Tier 1 analysts. | Alert fatigue is invisible from the inside. Analysts don't report the alerts they dismissed — they report the alerts they escalated. The pen test reveals which real-attack indicators were generated, triaged, and discarded. |
| Response playbook gaps — the SOC detected something but didn't know what to do with it. The playbook didn't cover this scenario. The escalation path was unclear. The containment action wasn't defined. | Playbooks are written against anticipated scenarios. Pen tests create unanticipated scenarios — or rather, scenarios the SOC should have anticipated but didn't. The gap between detection and effective response is only visible under real pressure. |
| EDR bypass techniques — the attacker used living-off-the-land binaries (LOLBins), legitimate remote administration tools, or in-memory execution that the EDR didn't flag because the activity used signed, trusted binaries. | EDR is designed to detect malware, exploit attempts, and suspicious process behaviour. Attackers who use certutil, mshta, wmic, PsExec, and PowerShell — all signed Microsoft binaries — operate within the EDR's trust model. Only adversarial testing reveals whether custom EDR rules have been created for LOLBin abuse. |
| Mean time to detect (MTTD) and mean time to respond (MTTR) — the actual elapsed time between attacker action and SOC detection, and between detection and effective containment. The real numbers, not the SLA targets. | The SOC tracks MTTD and MTTR for incidents it detects. It cannot track MTTD for incidents it misses entirely. The pen test provides the ground truth: every attacker action is timestamped, and every SOC response (or lack thereof) is measurable against it. |
The complement works in both directions. A pen test without SOC context reports every finding at face value — as if the attacker operates undetected. But if the SOC detects and contains the Kerberoasting within 15 minutes, the finding's real-world risk is different from an environment where it goes undetected for days. The defensive capability changes the risk calculus.
Purple teaming is the natural evolution of the pen test / SOC relationship. Instead of the pen tester operating covertly and the SOC discovering the results after the engagement, both teams work collaboratively — the tester executes techniques one at a time, the SOC attempts to detect each one, and gaps are addressed immediately.
| Traditional Pen Test | Purple Team Exercise | |
|---|---|---|
| Attacker visibility | Covert. The tester operates without the SOC's knowledge (or with minimal notification). Realism is maximised. | Collaborative. The tester and SOC analysts sit in the same room (or virtual session). Each technique is announced, executed, and evaluated together. |
| Detection feedback | After the engagement. The SOC learns what it missed when the report is delivered — days or weeks later. | Immediate. After each technique, the SOC checks: did we detect it? If not, why not? The gap is diagnosed, and a rule is drafted or tuned in the same session. |
| Output | A vulnerability report with a detection gap appendix. | A detection improvement log: each technique tested, the detection result, the root cause of any gap, and the rule or configuration change that closes it. |
| Best for | Assessing the overall security posture. Testing whether defences hold under realistic adversarial pressure without the SOC having advance notice. | Rapidly improving detection capability. Training SOC analysts against real attack techniques. Building and tuning detection rules with immediate feedback. |
| MITRE ATT&CK alignment | The report maps findings to ATT&CK techniques. The SOC reviews them post-engagement. | Each technique is executed by ATT&CK ID. Detection coverage is measured technique by technique against the ATT&CK matrix. Gaps are visualised in real time. |
Purple teaming isn't a replacement for traditional pen testing — it serves a different purpose. A covert pen test answers: "Can the SOC detect a realistic attack?" A purple team exercise answers: "For each specific technique, does the SOC have visibility, and if not, how do we create it?" Mature organisations run both: periodic covert pen tests to validate, and regular purple team sessions to improve.
The relationship between pen testing and defensive security evolves as the organisation's security programme matures. At each stage, the value each delivers — and the value each derives from the other — increases.
| Maturity Stage | Pen Test Focus | Defensive Security Focus | How They Interact |
|---|---|---|---|
| 1. Foundational | Identify the vulnerabilities. What misconfigurations exist? What can be exploited? What are the critical findings? | Deploy the tools. Stand up EDR, SIEM, and basic logging. Build the initial rule set. Establish a monitoring capability. | Minimal interaction. The pen test finds vulnerabilities. The SOC is too new to be tested. The findings are remediated independently. The detection gap isn't yet measured. |
| 2. Developing | Test the attack paths. Can the vulnerabilities be chained? How quickly does the attacker reach critical assets? | Tune the rules. Reduce false positives. Expand log sources. Begin tracking MTTD and MTTR for detected incidents. | The pen test report includes a detection gap appendix. The SOC uses it to write new rules. Detection improves between engagements. The feedback loop begins. |
| 3. Established | Test detection explicitly. Did the SOC detect the attack? How quickly? Was the response effective? Include detection assessment in the pen test scope. | Proactive hunting. Threat intelligence integration. Custom EDR rules for LOLBin abuse. Analyst training programme. | The pen test is designed to test the SOC as much as the infrastructure. Detection rate becomes a primary metric. Purple team exercises supplement covert pen tests. |
| 4. Advanced | Red team operations. Assumed breach. Objective-based testing ("can you reach the SWIFT terminal?"). Multi-week, multi-vector campaigns. | Mature SOC with threat hunting, behavioural analytics, deception technology (honeypots, honey tokens). Detection engineering as a discipline. | Full adversary simulation against a battle-tested SOC. The pen test is the training ground for the SOC. Each engagement makes the detection capability stronger. The detection gap shrinks with each cycle. |
| 5. Optimised | Continuous red teaming. Attack simulation platforms. Automated technique replay against the detection stack. | Detection-as-code. Automated rule testing. Continuous validation of detection coverage against the ATT&CK matrix. | Offence and defence operate as a continuous loop. New attack techniques are tested against the detection stack within days of publication. The organisation's security posture is validated continuously, not annually. |
Understanding that pen testing and defensive security are complementary is the first step. Implementing that understanding without falling into common traps is the second.
| Misstep | The Problem | The Fix |
|---|---|---|
| "We have EDR, so the pen test is less important now" | EDR detects a category of threats — malware, exploit attempts, suspicious process behaviour. It does not detect misconfiguration exploitation, credential abuse with legitimate tools, or living-off-the-land attacks by default. The pen test reveals the threats that EDR was never designed to catch. | Commission the pen test explicitly to test whether EDR detects the techniques used. Include EDR bypass assessment in the scope. Use the results to create custom EDR rules. |
| "The pen test didn't trigger any alerts, so the SOC failed" | A skilled pen tester deliberately avoids detection — that's part of the test. If the test was scoped as a covert assessment, a low detection rate measures SOC capability against a motivated adversary. It doesn't mean the SOC "failed" — it means the SOC now has specific intelligence about what to detect. | Frame the detection gap as a learning opportunity, not a performance failure. Use the gap analysis to build new detection rules. Measure improvement across engagements. |
| "We'll do the pen test in stealth mode and not tell the SOC" | If nobody in the SOC knows the pen test is happening, and the tester triggers a real incident response — analysts working through the night, management escalation, potential data breach notification — the organisation has wasted significant resources and damaged trust. | Always brief a designated SOC liaison. They don't share details with analysts (preserving the test's realism) but they can distinguish pen test activity from a genuine breach if escalation is needed. |
| "We do pen tests annually and purple teams quarterly — that's enough" | Annual pen tests assess point-in-time posture. Quarterly purple teams improve detection. But neither tests the organisation's ability to respond to a novel, multi-week intrusion that evolves over time — which is how real advanced threats operate. | At maturity, supplement pen tests and purple teams with a red team engagement: a multi-week, objective-based exercise where the red team operates covertly with an evolving strategy, testing the full kill chain from initial access to objective completion. |
| "The pen test and the SOC are managed by different vendors — they don't talk" | The pen test report is delivered to the security manager. The SOC operates independently. Nobody cross-references the attacker's actions with the SOC's telemetry. The detection gap is never measured. The most valuable finding from the engagement — what the SOC missed — is lost. | Require the pen test provider to produce a detection gap analysis. Share the tester's timestamped action log with the SOC. Hold a joint debrief where the tester walks the SOC through the attack path and the SOC identifies where telemetry existed but rules didn't fire. |
Penetration testing and defensive security are not alternatives. They're not even adjacent disciplines that happen to coexist. They are complementary halves of the same capability: the ability to understand, detect, and respond to the attacks that threaten the organisation.
A pen test without a SOC produces a list of vulnerabilities but no understanding of whether the organisation would detect the exploitation. A SOC without a pen test produces alerts and metrics but no evidence that the detection rules work against the attacks that actually succeed. The detection gap — the delta between what the attacker did and what the SOC saw — is the metric that ties them together, and it's only measurable when both are operating.
The organisations with the strongest security posture aren't the ones that spend the most on EDR or commission the most pen tests. They're the ones that use each to make the other better: the pen test reveals what the SOC misses, the SOC provides the context that changes how pen test findings are prioritised, and together they drive a continuous cycle of testing, detection, improvement, and retesting that makes the organisation measurably harder to compromise with every iteration.
Our penetration tests include detection gap analysis as standard — timestamped attacker actions cross-referenced against your SOC's telemetry, delivering the specific intelligence your defensive team needs to close the gaps that matter.