> cat /engagements/threat-model.yml_
A mid-size accountancy firm commissions two penetration tests twelve months apart. Both cost roughly the same. Both take the same number of tester-days. Both produce professional reports with executive summaries and colour-coded findings.
The first test runs a systematic assessment against industry checklists. It finds 34 vulnerabilities: some missing patches, a few SSL/TLS issues, an outdated SSH version, and a handful of medium-severity web application findings. The report sorts them by CVSS score. Remediation is straightforward. The IT team patches the criticals within a fortnight and files the report.
The second test starts differently. Before a single tool is launched, the testers research which threat actors are actively targeting UK accountancy firms. They identify the most likely attack scenarios — business email compromise for payment redirection fraud, ransomware for extortion, and state-sponsored espionage targeting client data for high-profile mergers. They build three specific scenarios and pursue each one against the live environment.
The second test finds 19 vulnerabilities — fewer than the first. But it also demonstrates that a realistic attacker could phish a credential from the finance team, use it to access the internal network via the firm's Citrix gateway, escalate to Domain Admin through a Kerberoastable service account, and reach the partner file share containing every client engagement letter, tax return, and board pack. End to end, under four hours. Zero alerts.
The first test found vulnerabilities. The second test found the risk. That difference — between listing what's broken and demonstrating what an attacker could actually achieve — is the difference between checklist testing and threat scenario modelling.
More findings does not mean a better test. 34 decontextualised vulnerabilities tell you less about your risk than 3 demonstrated attack paths. Attackers don't exploit individual vulnerabilities — they chain weaknesses into complete compromises. If your pen test report doesn't show the chains, it's showing you the pieces but not the picture.
Threat scenario modelling is the practice of designing penetration test engagements around structured hypotheses about how specific adversaries would attack a specific organisation. It replaces the question "what vulnerabilities exist?" with a far more useful one: "could this specific type of attacker achieve this specific objective against us?"
A scenario is not a vague concept — it is a precise, testable proposition. It has five components, each derived from evidence rather than assumption:
| Component | What It Defines | Where the Evidence Comes From |
|---|---|---|
| Adversary | Who would attack this organisation? What are their resources, capabilities, and level of patience? | Sector threat intelligence, NCSC advisories, MITRE ATT&CK threat group profiles, breach reports from comparable organisations |
| Motivation | Why this organisation specifically? What does the attacker want — money, data, disruption, access to clients? | Understanding of the client's data holdings, client base, sector position, and what would make them a valuable target |
| Entry vector | How does the attacker get the first foothold? Phishing? Exploiting an internet-facing service? Compromised supply chain? | OSINT against the client's external attack surface, common initial access vectors for the identified adversary type, recent exploitation trends |
| Attack chain | What happens after initial access? Which techniques does the attacker use to escalate, move laterally, and reach the objective? | MITRE ATT&CK technique mappings for the identified threat group, common post-exploitation patterns in the client's technology stack |
| Objective | What does the attacker achieve if the chain completes? Data exfiltration? Ransomware? Fraud? Persistent access? | Business impact analysis of the client's crown jewels — what would hurt most and how |
The test then validates each scenario against the live environment. The report doesn't just list findings — it tells the story of each scenario: which steps succeeded, which were blocked, which were detected, and what the real-world consequence would have been.
Credible scenarios aren't invented in a meeting room. They're assembled from overlapping intelligence sources, each providing a different lens on the threat landscape. The process is methodical — closer to investigative research than creative brainstorming.
No single source is sufficient. A scenario built on NCSC intelligence alone might miss a client-specific exposure found through OSINT. A scenario built on OSINT alone might miss the broader threat landscape. The value is in the cross-referencing — triangulating multiple sources to produce scenarios that are both realistic in the abstract and specific to the client.
Let's walk through the process end to end for a concrete client: a 200-person logistics company operating across three UK sites with a hybrid Microsoft environment, a customer-facing booking portal, and a fleet management system connected to on-premises operational technology.
Three scenarios, each grounded in evidence: the ransomware scenario is based on NCSC advisories about the logistics sector and the client's exposed Fortinet VPN. The BEC scenario is based on breached credentials found in OSINT and the sector's high rate of credential-based attacks. The OT scenario is based on the client's specific architecture — a web portal potentially bridging IT and operational technology networks.
These aren't speculative. Every component maps to something real: a known threat group, a known technique, and a confirmed element of the client's infrastructure.
Once the scenarios are designed, we map each phase to specific MITRE ATT&CK techniques. This serves two purposes: it ensures our testing is technically precise (we simulate the exact methods the identified threat groups use), and it provides a structured framework for reporting which techniques your defences stop, detect, or miss.
Here's how Scenario A — the LockBit ransomware path — maps to ATT&CK for the logistics company:
| Kill Chain Phase | ATT&CK Technique | What We Test | Detection Opportunity |
|---|---|---|---|
| Initial Access | T1190 — Exploit Public-Facing Application | Attempt exploitation of Fortinet VPN. Test for CVE-2024-21762 and related CVEs. Verify patch status and configuration hardening. | Perimeter IDS/IPS alerting on exploit signatures. VPN access logging for anomalous connections. |
| Discovery | T1087.002 — Domain Account Enumeration | Enumerate AD users, groups, trusts, and SPNs using BloodHound. Map shortest path to Domain Admin. | SIEM alerting on LDAP query volume. Honeypot account triggering on enumeration. |
| Credential Access | T1558.003 — Kerberoasting | Request TGS tickets for SPN-enabled service accounts. Attempt offline cracking. Measure time-to-crack. | Advanced AD monitoring detecting anomalous TGS requests. SIEM correlation rules for Kerberoast patterns. |
| Privilege Escalation | T1078.002 — Valid Accounts: Domain Accounts | Use cracked service account credentials to escalate privileges. Test whether the account has DA membership or delegation rights. | Privileged account monitoring. Alerting on service account interactive logon. |
| Lateral Movement | T1021.002 — SMB/Windows Admin Shares | Move laterally using DA credentials via SMB. Test segmentation between IT and OT networks. Attempt access to fleet management systems. | EDR alerting on lateral movement tools. Network monitoring for anomalous SMB authentication patterns. |
| Impact (simulated) | T1486 — Data Encrypted for Impact | Demonstrate ability to deploy a payload via Group Policy to all domain-joined machines. Verify backup accessibility and integrity. No actual encryption performed. | GPO change monitoring. Backup integrity verification. Anti-ransomware behavioural detection. |
Notice the fourth column: detection opportunity. Every phase of every scenario is also an opportunity to test whether your monitoring, alerting, and response capabilities would catch a real attack. Scenario-based testing doesn't just find vulnerabilities — it simultaneously evaluates your detection posture at each phase of the kill chain.
The most visible difference between checklist-based and scenario-based testing is in the deliverable. A checklist report is a structured catalogue of individual findings. A scenario report is a narrative — and narratives are how humans process risk, make decisions, and allocate resources.
| Checklist Report | Scenario Report | |
|---|---|---|
| Executive summary | Pie chart: 4 critical, 12 high, 18 medium. "Overall risk posture: requires improvement." | "A LockBit-style ransomware group could achieve Domain Admin in 2.5 hours via the unpatched Fortinet VPN. Domain-wide encryption is achievable within 4 hours. Fleet operations would be grounded. Estimated business impact: £2.1M per day of downtime." |
| Finding format | Individual findings sorted by CVSS score. Each reported in isolation: title, severity, description, remediation. | Findings organised by scenario and kill chain phase. Each finding contextualised within the attack path: what it enables, what precedes it, what it leads to. |
| Remediation priority | Fix all criticals first, then highs, then mediums. Priority is determined by generic severity rating. | Fix the finding that breaks the most dangerous chain first. A medium-severity Kerberoastable account might be the highest priority because it's the link between initial access and Domain Admin. |
| Detection assessment | Not included — the test focuses on whether vulnerabilities exist. | Each scenario phase documents whether an alert was generated, how long until detection, and whether the SOC responded. A detection heat map shows coverage gaps across the kill chain. |
| Business impact | Generic — "could lead to data breach or system compromise." | Specific — "Scenario A succeeds: fleet grounded, £2.1M/day at risk, mandatory ICO notification, potential FCA regulatory action given client data exposure." |
| Board readability | Requires translation by the security team. Technical language, abstract severity ratings. | Directly presentable to the board. Written in terms of business outcomes, attacker objectives, and financial impact. |
In a checklist report, the Kerberoastable service account is a standalone high-severity finding — one of 12. In a scenario report, it's the pivot point in a complete ransomware chain. The same finding, the same remediation advice. But the scenario context transforms it from item #7 on a backlog into an urgent, board-level priority. Context doesn't change the vulnerability — it changes whether anyone fixes it.
A common misconception is that scenario-based testing requires a red team budget and a mature security programme. It doesn't. The approach scales to any organisation — the complexity of the scenarios simply matches the client's maturity.
| Maturity Stage | Appropriate Scenarios | Example |
|---|---|---|
| Early | Simple, single-chain scenarios focused on the most basic and most common attack paths. One or two scenarios per engagement. | "Can an attacker exploit a missing patch on the VPN and reach the file server?" — A single path from internet to sensitive data, testing perimeter security and basic segmentation. |
| Developing | Multi-step scenarios incorporating credential-based attacks, AD exploitation, and application-layer abuse. Two to three scenarios per engagement. | "Can a phished credential be used to access the internal network, escalate through AD, and reach the finance system?" — Tests email security, authentication controls, AD hardening, and access control in a single chain. |
| Established | Full adversary simulations with detection testing, multi-vector initial access, and covert operations. Three to five scenarios per engagement, with purple team elements. | "Can a LockBit affiliate achieve domain-wide ransomware deployment — and would the SOC detect it before impact?" — Tests the entire kill chain plus detection and response capability at every phase. |
| Advanced | CBEST/TIBER-style exercises with bespoke threat intelligence, extended timescales, and assume-breach scenarios targeting specific high-value assets. | "Can a state-sponsored actor maintain persistent, undetected access to the M&A team's communications for 30 days?" — Tests advanced persistence, evasion, and the organisation's ability to detect low-and-slow adversaries. |
An early-stage organisation commissioning its first internal pen test benefits from a simple scenario as much as a financial institution benefits from a CBEST exercise. The principle is the same — test against a realistic threat, not against a generic checklist. Only the sophistication scales.
Scenario-based testing is powerful when done well and misleading when done badly. Here are the pitfalls we've learned to avoid — and that you should watch for when evaluating a provider's approach.
| Mistake | Why It's Harmful | The Fix |
|---|---|---|
| Scenarios without intelligence | If the scenarios are invented rather than evidence-based, they're fiction. Testing against implausible threats wastes effort on unlikely attack paths while ignoring the real ones. | Every scenario must trace back to a specific intelligence source — a threat report, a breach analysis, an NCSC advisory, or OSINT findings. If you can't cite the evidence, the scenario isn't grounded. |
| Only testing the scenario | Tunnel vision on pre-defined scenarios means opportunistic findings are missed. A vulnerability outside the scenario is still a vulnerability an attacker could exploit. | Scenarios provide structure, not blinkers. Testers follow the scenario chains but also capture and report opportunistic findings discovered during testing. The best engagements blend structured scenarios with exploratory testing. |
| Too many scenarios for the budget | Five scenarios in a five-day test means one day per scenario — not enough depth to validate any of them properly. Breadth without depth, the same trap as checklist testing. | Two to three well-developed scenarios are better than five shallow ones. Depth matters more than coverage. If budget is limited, prioritise the highest-risk scenario and test it thoroughly. |
| Static scenarios year after year | The threat landscape changes. Reusing last year's scenarios without updating them against current intelligence means testing against yesterday's threats while today's go unaddressed. | Rebuild scenarios from current intelligence for every engagement. Some themes may recur — ransomware is perennial — but the specific techniques, entry vectors, and threat groups should reflect the latest intelligence. |
| Scenarios that ignore people | A scenario that starts at "attacker has network access" skips the most common initial access vector: social engineering. If the entry point is unrealistic, the whole chain is unrealistic. | Include the human element wherever the scenario supports it. Phishing, vishing, or pretexting as the initial access vector makes the scenario end-to-end realistic and tests the layer most defences neglect. |
Many providers now claim to offer "threat-led" or "scenario-based" testing. Some mean it. Others have relabelled their checklist service. Here's how to tell the difference.
| Ask This | A Genuine Answer Sounds Like... | A Performative Answer Sounds Like... |
|---|---|---|
| "How will you decide which scenarios to test?" | "We'll research the threat actors targeting your sector, conduct OSINT against your estate, and design scenarios based on what we find. We'll present them for your input before testing." | "We use our standard methodology which covers all the main attack vectors." |
| "Can you show me an example scenario from a previous engagement?" | A structured scenario with named adversary type, specific ATT&CK techniques, defined objective, and clear success criteria — anonymised from a real engagement. | A vague description like "we simulate real-world attacks" without concrete structure or evidence base. |
| "How will the report be structured?" | "Around the scenarios — each one told as a narrative with findings contextualised within the attack chain. Plus a section for opportunistic findings outside the scenarios." | "Standard format — findings by severity, executive summary, remediation table." |
| "Will you test detection as well as prevention?" | "Yes — we document which techniques generated alerts and which were missed. We provide a detection heat map against the ATT&CK techniques tested." | "That's more of a red team thing — this is just a pen test." |
| "What intelligence sources do you use?" | A specific list: NCSC, MITRE ATT&CK, Verizon DBIR, sector ISACs, CrowdStrike/Mandiant reporting, client-specific OSINT. | "Our testers have extensive experience" — which is about the tester, not the intelligence. |
You don't need to overhaul your testing programme overnight. Here's how to begin incorporating threat-informed thinking into your existing approach.
A penetration test is only as valuable as the question it answers. Checklist-based testing asks "do these specific vulnerability types exist?" — a useful but limited question. Scenario-based testing asks "could a realistic adversary achieve their objective against this specific organisation?" — a question whose answer directly informs strategy, budget, and risk appetite.
By grounding our scenarios in real threat intelligence, mapping them to MITRE ATT&CK, and validating them against the live environment end to end, we ensure that every engagement produces findings that are relevant to the threats you actually face — not a generic list of vulnerabilities that could apply to any organisation in any sector.
Attackers don't follow checklists. Your testing shouldn't either.
We build every engagement around the adversaries targeting your sector, the attack paths into your specific environment, and the assets that matter most to your business. The result is testing that finds what a real attacker would find — reported in a way your board can act on.