The anatomy of cyber threats: a natural language approach to asset-based threat modelling
The anatomy of cyber threats: a natural language approach to asset-based threat modelling
The pervasiveness and criticality of emerging technologies throughout personal life, business, government and defence, is generating a growing demand for more advanced cyber threat detection, analysis and response capabilities. Threat modelling is a core component of these activities, and asset-based threat modelling in particular constitutes a primary methodological approach for characterising and understanding cyber threats in a broad range of technological and industrial domains. However, existing asset-based threat modelling processes typically exhibit a range of methodological limitations which can significantly constrain the validity of any resultant threat models.
Accordingly, this work develops and presents a generalised Reference Framework for Asset-based Threat Modelling (ReFAThM) intended to guide the development of such threat models, evaluate their completeness, and thus improve their robustness. We also present a novel automated method for characterising the threat landscape using topic modelling. Here, the CWE dataset is used as ground-truth and is pre-processed to synthesise 12 text-based threat attributes for each, using an LLM. These combined attributes constitute the input to topic modelling. Following a cluster merging process, this yields a concise set of 19 threat types, without undermining the breadth and depth of the originating vulnerability dataset. In a subsequent research activity, we utilise these 19 clusters in a classification paradigm to conduct feature importance analysis over the original 12 threat attributes. We quantitatively establish that the ’vulnerability’, ’countermeasures’, ’detection method’, ’technical impact’ and ’relevant assets’ are the most important threat attributes when characterising a cyber threat. These findings are then employed within an ontology engineering process to develop a ’core threat model’ which is suitable to form the basis of more specialised asset-based threat models in a broad range of domains.
cyber threat modelling, NLP, asset-based threat modelling, machine learning, LLM
University of Southampton
Bell, Tom James
3e86e394-03dd-4063-9a70-35709228dd16
2026
Bell, Tom James
3e86e394-03dd-4063-9a70-35709228dd16
Sassone, vladi
df7d3c83-2aa0-4571-be94-9473b07b03e7
Surridge, Mike
870d2b8d-2e20-4c6b-b8b1-d1412c2a8ef8
Atamli, Ahmad
dacf7d9e-9898-4385-bf88-5aec14d76872
Bell, Tom James
(2026)
The anatomy of cyber threats: a natural language approach to asset-based threat modelling.
University of Southampton, Doctoral Thesis, 185pp.
Record type:
Thesis
(Doctoral)
Abstract
The pervasiveness and criticality of emerging technologies throughout personal life, business, government and defence, is generating a growing demand for more advanced cyber threat detection, analysis and response capabilities. Threat modelling is a core component of these activities, and asset-based threat modelling in particular constitutes a primary methodological approach for characterising and understanding cyber threats in a broad range of technological and industrial domains. However, existing asset-based threat modelling processes typically exhibit a range of methodological limitations which can significantly constrain the validity of any resultant threat models.
Accordingly, this work develops and presents a generalised Reference Framework for Asset-based Threat Modelling (ReFAThM) intended to guide the development of such threat models, evaluate their completeness, and thus improve their robustness. We also present a novel automated method for characterising the threat landscape using topic modelling. Here, the CWE dataset is used as ground-truth and is pre-processed to synthesise 12 text-based threat attributes for each, using an LLM. These combined attributes constitute the input to topic modelling. Following a cluster merging process, this yields a concise set of 19 threat types, without undermining the breadth and depth of the originating vulnerability dataset. In a subsequent research activity, we utilise these 19 clusters in a classification paradigm to conduct feature importance analysis over the original 12 threat attributes. We quantitatively establish that the ’vulnerability’, ’countermeasures’, ’detection method’, ’technical impact’ and ’relevant assets’ are the most important threat attributes when characterising a cyber threat. These findings are then employed within an ontology engineering process to develop a ’core threat model’ which is suitable to form the basis of more specialised asset-based threat models in a broad range of domains.
Text
Southampton_PhD_Thesis__Tom_Bell___ongoing_ (25)
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Tom-Bell
Restricted to Repository staff only
More information
Published date: 2026
Keywords:
cyber threat modelling, NLP, asset-based threat modelling, machine learning, LLM
Identifiers
Local EPrints ID: 510393
URI: http://eprints.soton.ac.uk/id/eprint/510393
PURE UUID: f54457fa-3abc-42f3-abcd-e5069f182d89
Catalogue record
Date deposited: 30 Mar 2026 16:42
Last modified: 31 Mar 2026 01:58
Export record
Contributors
Author:
Tom James Bell
Thesis advisor:
vladi Sassone
Thesis advisor:
Mike Surridge
Thesis advisor:
Ahmad Atamli
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics