Penetration Testing

The Real World Risk Score: Why Context Beats CVSS

> echo 'CVSS 9.1' | context --chain=no --data=none --detect=yes && echo 'Medium'_

Peter Bassill 2 September 2025 15 min read
CVSS risk scoring real world risk severity remediation prioritisation reporting

CVSS scores are precise. They're also frequently misleading.

A penetration test produces 34 findings. The IT manager opens the report, sorts by CVSS score, and starts at the top. The first finding is SMB signing not enforced — CVSS 9.1, critical. They spend three days planning the rollout across 200 servers, testing for performance impact, and scheduling the change window. It's important work. It's the right thing to do eventually.

Meanwhile, finding number seventeen — LLMNR enabled, CVSS 5.3, medium — sits in the backlog. Nobody's looked at it. It's a 5.3. There are seven findings above it in the queue. It can wait.

Except LLMNR was the entry point to the attack chain that reached the ERP database containing customer pricing, supplier contracts, and production schedules. The tester captured credentials through LLMNR poisoning in 90 seconds, cracked them in 4 seconds, Kerberoasted a service account, and reached the crown jewels in under two hours. SMB signing — the CVSS 9.1 finding that consumed three days of remediation effort — wasn't part of the chain at all.

The IT manager did exactly what the scoring system told them to do. The scoring system was wrong — not because the maths was incorrect, but because the maths doesn't account for context. CVSS measures a vulnerability's intrinsic characteristics. It doesn't measure what the vulnerability means in this environment, for this organisation, in the context of this engagement's demonstrated attack paths.

Our Approach

Every Hedgehog Security report includes a Real World Risk Score alongside the CVSS score for each finding. The CVSS score tells you what the vulnerability is. The Real World Risk Score tells you what it means — in your environment, with your data, behind your controls, in the context of the attack chains we actually demonstrated. When the two scores disagree, the Real World Risk Score is the one that should drive remediation.


What the industry standard doesn't measure.

CVSS — the Common Vulnerability Scoring System — is a standardised framework for rating the technical severity of vulnerabilities. It's maintained by FIRST (the Forum of Incident Response and Security Teams), and it's the industry default. Version 4.0 was released in 2023 and represents a significant improvement over v3.1 — but even v4.0 has structural limitations that make it a poor sole basis for remediation prioritisation.

What CVSS Measures What CVSS Doesn't Measure
Attack vector — network, adjacent, local, or physical access required Whether the attacker actually has network/adjacent/local access in this environment. A network-accessible vulnerability behind a firewall that blocks the port is less exploitable than the score implies.
Attack complexity — low or high, based on conditions outside the attacker's control The actual complexity observed during the engagement. "High complexity" in CVSS may be trivial in practice if the conditions are met by default in this environment.
Privileges required — none, low, or high Whether the attacker obtained those privileges through another finding in the same engagement. A finding requiring "high" privileges is more exploitable if the tester already demonstrated privilege escalation to that level.
Confidentiality, integrity, and availability impact — how much data is exposed, modified, or disrupted Which specific data is exposed. CVSS rates "high confidentiality impact" identically whether the data is a development server's hostname or 100,000 customer records. The business impact is entirely different.
The vulnerability in isolation — its technical characteristics as an independent entity The vulnerability in context — its role in an attack chain, its proximity to sensitive data, the presence or absence of compensating controls, and whether it was actually exploited during the engagement.

CVSS v4.0 introduced supplemental metrics (Automatable, Recovery, Value Density, Provider Urgency) and an Environmental metric group that allows organisations to adjust scores based on their own context. In theory, this addresses the context problem. In practice, almost nobody uses the Environmental metrics — they require per-finding, per-asset analysis that most organisations don't have the resources to perform. The result is that virtually every pen test report uses the Base score, unmodified, as the severity rating.


Our methodology for scoring what actually matters.

The Real World Risk Score is our contextual severity rating. It's applied to every finding in every report we produce, and it accounts for the five factors that determine a vulnerability's actual risk to the specific organisation being tested — not its theoretical risk in an abstract environment.

Factor What We Assess How It Modifies Severity
1. Chain Position Is this finding part of a demonstrated attack chain? Is it the entry point, an escalation step, or the final access to sensitive data? Or is it standalone — exploitable but not connected to a path that reaches anything critical? A finding that's the entry point to a chain ending at the crown jewels is elevated regardless of its CVSS score. A finding that's standalone and leads nowhere may be de-elevated even if its CVSS score is high. Chain position is the single most important modifier.
2. Data Sensitivity What data does exploitation give access to? Customer PII, financial records, intellectual property, health data, authentication credentials — or a development server's error logs? The nature of the data behind the finding determines the business impact. Access to 100,000 customer records is a different risk from access to a print server's configuration page, even if the technical vulnerability is identical. Data sensitivity directly maps to regulatory consequence, financial liability, and reputational damage.
3. Compensating Controls Are there controls that mitigate the finding's risk? Does the SOC detect exploitation? Does network segmentation limit the blast radius? Does MFA prevent credential reuse? Was the finding exploited despite or because of the absence of compensating controls? A Kerberoastable service account in an environment where the SOC detects Kerberos anomalies within five minutes is less severe than the same account in an environment where Kerberoasting went undetected for three hours. The compensating control changes the real-world risk without changing the CVSS score.
4. Demonstrated Exploitability Did we actually exploit this finding during the engagement? Was it trivial or did it require significant effort? Was it exploitable remotely or did it require specific positioning? How long did it take? Did it require chaining with other findings? A vulnerability that was exploited in 4 seconds from a meeting room network port is more severe than a vulnerability that's theoretically exploitable but required conditions the tester couldn't achieve. CVSS rates theoretical exploitability. We rate demonstrated exploitability.
5. Business Impact What happens to the organisation if this is exploited by a real attacker? Regulatory fine? Customer notification? Operational shutdown? Competitive disadvantage? Insurance claim? The answer is specific to this organisation, this data, and this regulatory context. A finding that leads to a reportable data breach under UK GDPR (72-hour notification to the ICO, potential fine of up to 4% of global turnover) has a different business impact from a finding that leads to temporary disruption of an internal service. The business impact is the factor that makes the board care.

Each factor is assessed independently and the combined assessment produces the Real World Risk Score: Critical, High, Medium, Low, or Informational. The score is accompanied by a written justification — a paragraph of prose that explains why the finding has this rating in this environment. The justification is as important as the score, because it communicates the reasoning to every audience: the board understands the business impact, the CISO understands the chain position, and the IT team understands the technical context.


Why the paragraph matters more than the number.

A severity score — whether CVSS or Real World Risk — is a label. It tells you the category. It doesn't tell you why. The risk justification is the prose that connects the score to the reasoning, and it's the element most often missing from pen test reports. Without it, the reader has a number and no understanding of what produced it.

Score Without Justification — What Most Reports Provide
Finding: LLMNR/NBT-NS Broadcast Protocols Enabled
CVSS: 5.3 (Medium)
Severity: Medium

# The reader sees: Medium. It can wait.
# Nothing explains why this is the entry point to a critical chain.
# Nothing connects it to the ERP database breach.
# The finding sits in the backlog at position 17 of 34.
Score With Justification — What Our Reports Provide
Finding: LLMNR/NBT-NS Broadcast Protocols Enabled
CVSS: 5.3 (Medium)
Real World Risk: CRITICAL

Justification:
LLMNR enabled on the production VLAN allowed the tester to capture
domain credentials passively within 90 seconds of connecting to a
meeting room network port. No interaction with any system was required.
The captured credential (j.smith, cracked in 4 seconds) provided
the authenticated access needed to Kerberoast svc_erp, which led
to administrative access to the ERP database containing customer
pricing and supplier contracts. This finding is the entry point to
the critical attack chain (see Attack Path 1). Disabling LLMNR via
Group Policy takes approximately 15 minutes and breaks the entire
chain. The CVSS score of 5.3 reflects the finding in isolation.
The Real World Risk Score of Critical reflects its demonstrated role
as the entry point to the organisation's highest-impact compromise.

The same finding. The same CVSS score. But the reader of the second version understands why this medium-CVSS finding is actually the most urgent remediation in the entire report. They understand the chain, the data at risk, the time to exploit, and the cost of the fix. The justification transforms a number into a decision.


How context changes the score — in both directions.

Context doesn't only elevate findings — it also de-elevates them. A high-CVSS finding behind effective compensating controls may warrant a lower Real World Risk Score. The model works in both directions, producing ratings that are more accurate than CVSS alone in every case.

Finding CVSS Real World Risk Why They Differ
LLMNR enabled on production VLAN 5.3 (Medium) Critical Entry point to the demonstrated attack chain reaching the ERP database. Passive exploitation. 90 seconds to credential capture. Chain leads to customer pricing and supplier contracts. No compensating controls. Zero detection.
SMB signing not enforced on 12 member servers 9.1 (Critical) High Enables relay attacks — but the tester didn't need a relay because cracked credentials provided direct access. Not part of the demonstrated attack chain. Important standalone finding, but not the path to the crown jewels in this engagement.
Apache Struts CVE-2017-5638 (RCE) on internal dev server 10.0 (Critical) Medium CVSS 10.0 — maximum possible score. But the server is an internal development box with no sensitive data, no connectivity to production, and network segmentation that prevents lateral movement. The exploit works, but the blast radius is minimal.
Kerberoastable service account (svc_erp) 8.8 (High) Critical Escalation step in the demonstrated chain. Password cracked in 11 seconds. Account has local admin on the ERP server. Leads directly to database access. No detection of the Kerberos ticket request. Password unchanged since 2019.
TLS 1.0 enabled on internal intranet portal 5.2 (Medium) Low Weak cryptographic protocol, but exploitation requires an active man-in-the-middle position on the internal network, which the tester already achieved through simpler means. The portal contains only the staff lunch menu and meeting room booking. No sensitive data at risk.
Default SNMP community string "public" on 3 network switches 7.5 (High) Critical SNMP read access to core network switches exposes VLAN configuration, routing tables, and ARP tables — providing a complete map of the internal network. In this engagement, the tester used SNMP data to identify the ERP server's VLAN and IP address. Additionally, write access via the default community string could allow VLAN reconfiguration.

In six findings, the Real World Risk Score agreed with the CVSS category zero times. Not because CVSS is wrong — the technical scores are accurate. But because technical severity and real-world risk are measuring different things. The organisation that prioritises by CVSS fixes the Struts RCE on the development server (10.0) before disabling LLMNR on the production network (5.3). The organisation that prioritises by Real World Risk does the opposite — and closes the path to the crown jewels first.


How contextual scoring serves every reader.

A single severity number, no matter how well-calibrated, serves a limited audience. The Real World Risk Score combined with its written justification serves all three audiences that read a pen test report — because the justification translates the technical reality into the language each reader needs.

Audience What the Score Tells Them What the Justification Tells Them
The Board Critical — this needs immediate attention and may require budget approval. "This misconfiguration allowed access to the ERP database containing customer pricing and supplier contracts within two hours. The fix takes 15 minutes. Exploitation was not detected by any monitoring system." The board understands the business risk, the cost of inaction, and the cost of the fix.
The CISO Critical — despite a CVSS of 5.3 — because of chain position and demonstrated impact. "This finding is the entry point to Attack Path 1. The CVSS score reflects isolated technical severity. The Real World Risk Score reflects its role in the demonstrated chain to the ERP database, the absence of compensating controls, and zero detection across the kill chain." The CISO understands the risk methodology and can defend the prioritisation.
The IT Team Critical — fix this before the CVSS 9.1 SMB signing finding. "Disable LLMNR and NBT-NS via Group Policy: Computer Configuration → Administrative Templates → DNS Client → Turn Off Multicast Name Resolution → Enabled. Verify with Responder on the VLAN — no hashes should be captured." The IT team has the exact remediation path and can implement it immediately.

"But CVSS is the standard — shouldn't we just use that?"

We include the CVSS Base score in every report. We don't discard it — it's useful, standardised, and reproducible. But we supplement it, because the Base score alone produces prioritisation decisions that don't align with real-world risk.

"CVSS v4.0 Has Environmental Metrics"
It does — and in theory, the Environmental metric group solves the context problem. In practice, it requires per-finding, per-asset analysis by the organisation: mission criticality, data sensitivity, effectiveness of compensating controls. Almost no organisation performs this analysis. Almost every report uses the Base score unmodified. The Real World Risk Score performs this analysis for you, based on what the tester observed during the engagement.
"Our Compliance Framework Requires CVSS"
Many do — PCI DSS, for example, uses CVSS to define severity thresholds. We provide CVSS scores precisely for this reason. The Real World Risk Score is an additional rating, not a replacement. The compliance framework gets the CVSS score it requires. The remediation team gets the contextual rating that tells them what to fix first.
"We Need Consistency Across Providers"
CVSS provides cross-provider comparability — any two providers should produce the same Base score for the same vulnerability. That's valuable. But cross-provider comparability is less useful than accurate prioritisation within a single engagement. The Real World Risk Score is engagement-specific because your risk is environment-specific.
"Contextual Scoring Is Subjective"
Every risk assessment involves judgement — including CVSS, where the analyst must determine the attack vector, complexity, and impact for findings that don't have a published CVE. The Real World Risk Score makes its judgement transparent through the written justification. The reasoning is visible, challengeable, and auditable. A CVSS score without justification is no less subjective — the judgement is simply hidden.

Getting risk ratings that drive the right decisions.

Demand Contextual Severity in Every Report
Ask your pen test provider whether they rate findings by CVSS alone or by contextual severity. If the answer is CVSS alone, ask how they account for chain position, data sensitivity, and compensating controls. If they don't — you're prioritising remediation by a number that doesn't account for your environment.
Insist on Written Risk Justifications
Every finding should include a paragraph explaining why it has the severity rating it does — in terms specific to your environment. "This finding is rated Critical because it enabled access to the ERP database" is actionable. "This finding is rated Critical because the CVSS score is 8.8" is circular.
Prioritise by Real-World Impact, Not Score
When planning remediation, sort by demonstrated impact first: which findings were part of chains that reached sensitive data? Which enabled the most damaging compromise? Fix those first — regardless of whether a standalone finding higher in the list has a larger CVSS number.
Challenge Scores That Don't Feel Right
If a finding is rated Medium but your instinct says it's worse — or Critical but it feels inconsequential — ask the provider to justify the rating in terms of your specific environment. A good provider will welcome the challenge and provide the context. A provider who can't justify their scores beyond "that's what the CVSS calculator said" is not providing contextual analysis.
Track Both Scores Over Time
Report both CVSS and Real World Risk scores across engagements. CVSS shows whether the same technical vulnerabilities recur. Real World Risk shows whether the organisation's actual exposure — accounting for chains, data, and controls — is improving. The trends tell different stories, and both matter.

The bottom line.

CVSS is a useful standard that measures the wrong thing for remediation prioritisation. It rates the vulnerability in isolation — its attack vector, complexity, and impact as abstract technical properties. It doesn't rate the vulnerability in context — its position in a demonstrated attack chain, its proximity to sensitive data, the compensating controls that mitigate or fail to mitigate it, the ease with which it was actually exploited, or the specific business impact for this organisation.

The Real World Risk Score fills the gap. It takes every factor that CVSS doesn't measure — chain position, data sensitivity, compensating controls, demonstrated exploitability, and business impact — and produces a severity rating that reflects what the finding actually means for the organisation being tested. And the written justification that accompanies it transforms the rating from a label into reasoning that every audience can understand and act on.

A number without context is a guess dressed as precision. A number with a written justification grounded in demonstrated attack paths, specific data exposure, and real business consequence is a decision. The difference between the two is the difference between a remediation programme that fixes the highest numbers first and one that fixes the most important things first. Only one of them makes the organisation safer.


Every finding rated by what it means for your organisation — not what it scores in a calculator.

Our reports include both CVSS and Real World Risk Scores for every finding, with written justifications that connect the rating to the demonstrated attack path, the data at risk, and the specific business impact — so your remediation effort goes where it matters most.