Penetration Testing

How We Model Real-World Threat Scenarios

> cat /engagements/threat-model.yml_

Peter Bassill 25 March 2025 14 min read
threat modelling MITRE ATT&CK red team threat intelligence scenario-based testing

One found vulnerabilities. The other found the truth.

A mid-size accountancy firm commissions two penetration tests twelve months apart. Both cost roughly the same. Both take the same number of tester-days. Both produce professional reports with executive summaries and colour-coded findings.

The first test runs a systematic assessment against industry checklists. It finds 34 vulnerabilities: some missing patches, a few SSL/TLS issues, an outdated SSH version, and a handful of medium-severity web application findings. The report sorts them by CVSS score. Remediation is straightforward. The IT team patches the criticals within a fortnight and files the report.

The second test starts differently. Before a single tool is launched, the testers research which threat actors are actively targeting UK accountancy firms. They identify the most likely attack scenarios — business email compromise for payment redirection fraud, ransomware for extortion, and state-sponsored espionage targeting client data for high-profile mergers. They build three specific scenarios and pursue each one against the live environment.

The second test finds 19 vulnerabilities — fewer than the first. But it also demonstrates that a realistic attacker could phish a credential from the finance team, use it to access the internal network via the firm's Citrix gateway, escalate to Domain Admin through a Kerberoastable service account, and reach the partner file share containing every client engagement letter, tax return, and board pack. End to end, under four hours. Zero alerts.

The first test found vulnerabilities. The second test found the risk. That difference — between listing what's broken and demonstrating what an attacker could actually achieve — is the difference between checklist testing and threat scenario modelling.

The Numbers Trap

More findings does not mean a better test. 34 decontextualised vulnerabilities tell you less about your risk than 3 demonstrated attack paths. Attackers don't exploit individual vulnerabilities — they chain weaknesses into complete compromises. If your pen test report doesn't show the chains, it's showing you the pieces but not the picture.


What threat scenario modelling actually means.

Threat scenario modelling is the practice of designing penetration test engagements around structured hypotheses about how specific adversaries would attack a specific organisation. It replaces the question "what vulnerabilities exist?" with a far more useful one: "could this specific type of attacker achieve this specific objective against us?"

A scenario is not a vague concept — it is a precise, testable proposition. It has five components, each derived from evidence rather than assumption:

Component What It Defines Where the Evidence Comes From
Adversary Who would attack this organisation? What are their resources, capabilities, and level of patience? Sector threat intelligence, NCSC advisories, MITRE ATT&CK threat group profiles, breach reports from comparable organisations
Motivation Why this organisation specifically? What does the attacker want — money, data, disruption, access to clients? Understanding of the client's data holdings, client base, sector position, and what would make them a valuable target
Entry vector How does the attacker get the first foothold? Phishing? Exploiting an internet-facing service? Compromised supply chain? OSINT against the client's external attack surface, common initial access vectors for the identified adversary type, recent exploitation trends
Attack chain What happens after initial access? Which techniques does the attacker use to escalate, move laterally, and reach the objective? MITRE ATT&CK technique mappings for the identified threat group, common post-exploitation patterns in the client's technology stack
Objective What does the attacker achieve if the chain completes? Data exfiltration? Ransomware? Fraud? Persistent access? Business impact analysis of the client's crown jewels — what would hurt most and how

The test then validates each scenario against the live environment. The report doesn't just list findings — it tells the story of each scenario: which steps succeeded, which were blocked, which were detected, and what the real-world consequence would have been.


Where the scenarios come from.

Credible scenarios aren't invented in a meeting room. They're assembled from overlapping intelligence sources, each providing a different lens on the threat landscape. The process is methodical — closer to investigative research than creative brainstorming.

National Threat Intelligence
The NCSC publishes regular advisories identifying which threat actors are targeting UK sectors. These aren't theoretical — they describe active campaigns, named threat groups, and specific TTPs observed in the wild. When the NCSC issues an advisory about ransomware groups exploiting Fortinet VPNs in professional services, that becomes a scenario for every professional services client we assess.
Sector Breach Data
The Verizon DBIR, IBM X-Force Threat Intelligence Index, and CrowdStrike Global Threat Report provide statistical analysis of real breaches by sector, vector, and outcome. If 62% of breaches in your sector begin with phishing and 40% result in ransomware, those probabilities shape which scenarios we prioritise.
MITRE ATT&CK
ATT&CK catalogues the tactics, techniques, and procedures of real threat groups. Each group has a documented profile of the methods they use, mapped to specific technical implementations. We select the ATT&CK techniques that correspond to the adversaries most relevant to each client — ensuring our testing replicates what those groups actually do.
OSINT Reconnaissance
Before designing scenarios, we passively reconnoitre the client exactly as an attacker would. Certificate transparency logs, DNS enumeration, Shodan, LinkedIn, GitHub, job advertisements revealing technology stacks, breached credential databases. What we find shapes which entry vectors are realistic.
Previous Engagement History
For returning clients, historical findings are invaluable. Which vulnerabilities were remediated? Which recurred? Are there systemic patterns — weak AD configuration, absent network segmentation, repeated credential hygiene failures — that suggest underlying issues beyond individual findings?
Regulatory and Legal Landscape
ICO enforcement actions, FCA fines, and published breach investigations reveal not just what went wrong but how regulators assessed the organisation's security posture. These inform which controls regulators consider essential — and therefore which gaps carry the highest regulatory risk.

No single source is sufficient. A scenario built on NCSC intelligence alone might miss a client-specific exposure found through OSINT. A scenario built on OSINT alone might miss the broader threat landscape. The value is in the cross-referencing — triangulating multiple sources to produce scenarios that are both realistic in the abstract and specific to the client.


Building a scenario from scratch.

Let's walk through the process end to end for a concrete client: a 200-person logistics company operating across three UK sites with a hybrid Microsoft environment, a customer-facing booking portal, and a fleet management system connected to on-premises operational technology.

Phase 1 — Intelligence Gathering
ncsc_advisory --sector=logistics --year=2025 # Active ransomware campaigns targeting logistics
verizon_dbir --sector=transportation # 73% of breaches involved stolen credentials
mitre_groups --targeting=logistics,manufacturing # FIN7, Conti successors, LockBit affiliates
osint --domain=acmelogistics.co.uk
result: Fortinet VPN portal on vpn.acmelogistics.co.uk
result: 3 employee creds in HaveIBeenPwned (2 reused)
result: Job advert mentions 'Azure AD, Dynamics 365, SCCM'
result: Booking portal on portal.acmelogistics.co.uk
result: Fleet mgmt system referenced in supplier PDF on Google
Phase 2 — Scenario Design
# SCENARIO A: Ransomware
adversary = 'LockBit affiliate' # Most active ransomware group targeting UK logistics
motive = 'Encrypt + exfiltrate for double extortion'
entry = 'Exploit Fortinet VPN (CVE-2024-21762) or phish'
chain = 'VPN → AD recon → Kerberoast → DA → GPO ransomware'
objective = 'Domain-wide encryption + data theft for leverage'
impact = 'Fleet grounded. £2.1M daily revenue at risk.'

# SCENARIO B: Business Email Compromise
adversary = 'BEC fraud group' # High volume, targets logistics/finance
motive = 'Redirect customer payments to attacker accounts'
entry = 'Credential stuffing with breached creds (no MFA)'
chain = 'Mailbox access → invoice interception → reply tampering'
objective = 'Redirect £50k+ payment to attacker-controlled bank'

# SCENARIO C: OT/Fleet Disruption
adversary = 'Opportunistic attacker via booking portal'
motive = 'Pivot from web app to internal OT network'
entry = 'Web app vulnerability in booking portal'
chain = 'SQLi/RCE → web server → internal network → fleet mgmt'
objective = 'Access fleet management, demonstrate OT impact'

Three scenarios, each grounded in evidence: the ransomware scenario is based on NCSC advisories about the logistics sector and the client's exposed Fortinet VPN. The BEC scenario is based on breached credentials found in OSINT and the sector's high rate of credential-based attacks. The OT scenario is based on the client's specific architecture — a web portal potentially bridging IT and operational technology networks.

These aren't speculative. Every component maps to something real: a known threat group, a known technique, and a confirmed element of the client's infrastructure.


Translating scenarios into testable techniques.

Once the scenarios are designed, we map each phase to specific MITRE ATT&CK techniques. This serves two purposes: it ensures our testing is technically precise (we simulate the exact methods the identified threat groups use), and it provides a structured framework for reporting which techniques your defences stop, detect, or miss.

Here's how Scenario A — the LockBit ransomware path — maps to ATT&CK for the logistics company:

Kill Chain Phase ATT&CK Technique What We Test Detection Opportunity
Initial Access T1190 — Exploit Public-Facing Application Attempt exploitation of Fortinet VPN. Test for CVE-2024-21762 and related CVEs. Verify patch status and configuration hardening. Perimeter IDS/IPS alerting on exploit signatures. VPN access logging for anomalous connections.
Discovery T1087.002 — Domain Account Enumeration Enumerate AD users, groups, trusts, and SPNs using BloodHound. Map shortest path to Domain Admin. SIEM alerting on LDAP query volume. Honeypot account triggering on enumeration.
Credential Access T1558.003 — Kerberoasting Request TGS tickets for SPN-enabled service accounts. Attempt offline cracking. Measure time-to-crack. Advanced AD monitoring detecting anomalous TGS requests. SIEM correlation rules for Kerberoast patterns.
Privilege Escalation T1078.002 — Valid Accounts: Domain Accounts Use cracked service account credentials to escalate privileges. Test whether the account has DA membership or delegation rights. Privileged account monitoring. Alerting on service account interactive logon.
Lateral Movement T1021.002 — SMB/Windows Admin Shares Move laterally using DA credentials via SMB. Test segmentation between IT and OT networks. Attempt access to fleet management systems. EDR alerting on lateral movement tools. Network monitoring for anomalous SMB authentication patterns.
Impact (simulated) T1486 — Data Encrypted for Impact Demonstrate ability to deploy a payload via Group Policy to all domain-joined machines. Verify backup accessibility and integrity. No actual encryption performed. GPO change monitoring. Backup integrity verification. Anti-ransomware behavioural detection.

Notice the fourth column: detection opportunity. Every phase of every scenario is also an opportunity to test whether your monitoring, alerting, and response capabilities would catch a real attack. Scenario-based testing doesn't just find vulnerabilities — it simultaneously evaluates your detection posture at each phase of the kill chain.


From a spreadsheet to a story.

The most visible difference between checklist-based and scenario-based testing is in the deliverable. A checklist report is a structured catalogue of individual findings. A scenario report is a narrative — and narratives are how humans process risk, make decisions, and allocate resources.

Checklist Report Scenario Report
Executive summary Pie chart: 4 critical, 12 high, 18 medium. "Overall risk posture: requires improvement." "A LockBit-style ransomware group could achieve Domain Admin in 2.5 hours via the unpatched Fortinet VPN. Domain-wide encryption is achievable within 4 hours. Fleet operations would be grounded. Estimated business impact: £2.1M per day of downtime."
Finding format Individual findings sorted by CVSS score. Each reported in isolation: title, severity, description, remediation. Findings organised by scenario and kill chain phase. Each finding contextualised within the attack path: what it enables, what precedes it, what it leads to.
Remediation priority Fix all criticals first, then highs, then mediums. Priority is determined by generic severity rating. Fix the finding that breaks the most dangerous chain first. A medium-severity Kerberoastable account might be the highest priority because it's the link between initial access and Domain Admin.
Detection assessment Not included — the test focuses on whether vulnerabilities exist. Each scenario phase documents whether an alert was generated, how long until detection, and whether the SOC responded. A detection heat map shows coverage gaps across the kill chain.
Business impact Generic — "could lead to data breach or system compromise." Specific — "Scenario A succeeds: fleet grounded, £2.1M/day at risk, mandatory ICO notification, potential FCA regulatory action given client data exposure."
Board readability Requires translation by the security team. Technical language, abstract severity ratings. Directly presentable to the board. Written in terms of business outcomes, attacker objectives, and financial impact.

The Remediation Paradox

In a checklist report, the Kerberoastable service account is a standalone high-severity finding — one of 12. In a scenario report, it's the pivot point in a complete ransomware chain. The same finding, the same remediation advice. But the scenario context transforms it from item #7 on a backlog into an urgent, board-level priority. Context doesn't change the vulnerability — it changes whether anyone fixes it.


Scenarios at every level.

A common misconception is that scenario-based testing requires a red team budget and a mature security programme. It doesn't. The approach scales to any organisation — the complexity of the scenarios simply matches the client's maturity.

Maturity Stage Appropriate Scenarios Example
Early Simple, single-chain scenarios focused on the most basic and most common attack paths. One or two scenarios per engagement. "Can an attacker exploit a missing patch on the VPN and reach the file server?" — A single path from internet to sensitive data, testing perimeter security and basic segmentation.
Developing Multi-step scenarios incorporating credential-based attacks, AD exploitation, and application-layer abuse. Two to three scenarios per engagement. "Can a phished credential be used to access the internal network, escalate through AD, and reach the finance system?" — Tests email security, authentication controls, AD hardening, and access control in a single chain.
Established Full adversary simulations with detection testing, multi-vector initial access, and covert operations. Three to five scenarios per engagement, with purple team elements. "Can a LockBit affiliate achieve domain-wide ransomware deployment — and would the SOC detect it before impact?" — Tests the entire kill chain plus detection and response capability at every phase.
Advanced CBEST/TIBER-style exercises with bespoke threat intelligence, extended timescales, and assume-breach scenarios targeting specific high-value assets. "Can a state-sponsored actor maintain persistent, undetected access to the M&A team's communications for 30 days?" — Tests advanced persistence, evasion, and the organisation's ability to detect low-and-slow adversaries.

An early-stage organisation commissioning its first internal pen test benefits from a simple scenario as much as a financial institution benefits from a CBEST exercise. The principle is the same — test against a realistic threat, not against a generic checklist. Only the sophistication scales.


How scenario modelling goes wrong.

Scenario-based testing is powerful when done well and misleading when done badly. Here are the pitfalls we've learned to avoid — and that you should watch for when evaluating a provider's approach.

Mistake Why It's Harmful The Fix
Scenarios without intelligence If the scenarios are invented rather than evidence-based, they're fiction. Testing against implausible threats wastes effort on unlikely attack paths while ignoring the real ones. Every scenario must trace back to a specific intelligence source — a threat report, a breach analysis, an NCSC advisory, or OSINT findings. If you can't cite the evidence, the scenario isn't grounded.
Only testing the scenario Tunnel vision on pre-defined scenarios means opportunistic findings are missed. A vulnerability outside the scenario is still a vulnerability an attacker could exploit. Scenarios provide structure, not blinkers. Testers follow the scenario chains but also capture and report opportunistic findings discovered during testing. The best engagements blend structured scenarios with exploratory testing.
Too many scenarios for the budget Five scenarios in a five-day test means one day per scenario — not enough depth to validate any of them properly. Breadth without depth, the same trap as checklist testing. Two to three well-developed scenarios are better than five shallow ones. Depth matters more than coverage. If budget is limited, prioritise the highest-risk scenario and test it thoroughly.
Static scenarios year after year The threat landscape changes. Reusing last year's scenarios without updating them against current intelligence means testing against yesterday's threats while today's go unaddressed. Rebuild scenarios from current intelligence for every engagement. Some themes may recur — ransomware is perennial — but the specific techniques, entry vectors, and threat groups should reflect the latest intelligence.
Scenarios that ignore people A scenario that starts at "attacker has network access" skips the most common initial access vector: social engineering. If the entry point is unrealistic, the whole chain is unrealistic. Include the human element wherever the scenario supports it. Phishing, vishing, or pretexting as the initial access vector makes the scenario end-to-end realistic and tests the layer most defences neglect.

Telling the difference between genuine and performative.

Many providers now claim to offer "threat-led" or "scenario-based" testing. Some mean it. Others have relabelled their checklist service. Here's how to tell the difference.

Ask This A Genuine Answer Sounds Like... A Performative Answer Sounds Like...
"How will you decide which scenarios to test?" "We'll research the threat actors targeting your sector, conduct OSINT against your estate, and design scenarios based on what we find. We'll present them for your input before testing." "We use our standard methodology which covers all the main attack vectors."
"Can you show me an example scenario from a previous engagement?" A structured scenario with named adversary type, specific ATT&CK techniques, defined objective, and clear success criteria — anonymised from a real engagement. A vague description like "we simulate real-world attacks" without concrete structure or evidence base.
"How will the report be structured?" "Around the scenarios — each one told as a narrative with findings contextualised within the attack chain. Plus a section for opportunistic findings outside the scenarios." "Standard format — findings by severity, executive summary, remediation table."
"Will you test detection as well as prevention?" "Yes — we document which techniques generated alerts and which were missed. We provide a detection heat map against the ATT&CK techniques tested." "That's more of a red team thing — this is just a pen test."
"What intelligence sources do you use?" A specific list: NCSC, MITRE ATT&CK, Verizon DBIR, sector ISACs, CrowdStrike/Mandiant reporting, client-specific OSINT. "Our testers have extensive experience" — which is about the tester, not the intelligence.

How to start thinking in scenarios.

You don't need to overhaul your testing programme overnight. Here's how to begin incorporating threat-informed thinking into your existing approach.

Name Your Crown Jewels
Before your next pen test, write down the three to five assets that would cause the most damage if compromised. Customer database? Financial system? Client documents? Operational technology? These are what your scenarios should target. If your pen test doesn't cover the paths to these assets, it's testing the wrong things.
Read One Threat Report
The NCSC annual review, the Verizon DBIR, or your sector ISAC's latest alert. One report, once a year, gives you enough intelligence to have a meaningful scoping conversation with your provider. Share it with them. Ask: "how do our scenarios reflect this?"
Demand Attack Chains
When you receive your next pen test report, ask your provider: "which of these findings chain together to form a complete attack path?" If every finding is isolated, you're missing the most important insight. If they can't answer, your next test needs a different approach.
Add Detection to the Brief
Tell your provider you want to know whether your SOC or monitoring tools detected the testing activity. Even for a standard pen test, this adds enormous value — it turns every finding into a dual question: "does this vulnerability exist?" and "would we notice if someone exploited it?"
Evolve Year on Year
Year one: one simple scenario alongside your standard test. Year two: two scenarios with detection assessment. Year three: full scenario-based engagement with ATT&CK mapping. Each year, the testing becomes more threat-informed and the results become more actionable.

The bottom line.

A penetration test is only as valuable as the question it answers. Checklist-based testing asks "do these specific vulnerability types exist?" — a useful but limited question. Scenario-based testing asks "could a realistic adversary achieve their objective against this specific organisation?" — a question whose answer directly informs strategy, budget, and risk appetite.

By grounding our scenarios in real threat intelligence, mapping them to MITRE ATT&CK, and validating them against the live environment end to end, we ensure that every engagement produces findings that are relevant to the threats you actually face — not a generic list of vulnerabilities that could apply to any organisation in any sector.

Attackers don't follow checklists. Your testing shouldn't either.


Tests designed around your actual threats.

We build every engagement around the adversaries targeting your sector, the attack paths into your specific environment, and the assets that matter most to your business. The result is testing that finds what a real attacker would find — reported in a way your board can act on.