The University of Southampton
University of Southampton Institutional Repository

AttackER: towards enhancing cyber-attack attribution with a named entity recognition dataset

AttackER: towards enhancing cyber-attack attribution with a named entity recognition dataset
AttackER: towards enhancing cyber-attack attribution with a named entity recognition dataset

Cyber-attack attribution is an important process that allows experts to put in place attacker-oriented countermeasures and legal actions. The analysts mainly perform attribution manually, given the complex nature of this task. AI and, more specifically, Natural Language Processing (NLP) techniques can be leveraged to support cybersecurity analysts during the attribution process. However powerful these techniques may be, they must address the lack of datasets in the attack attribution domain. In this work, we will fill this gap and will provide, to the best of our knowledge, the first dataset on cyber-attack attribution. We designed our dataset with the primary goal of extracting attack attribution information from cybersecurity texts, utilizing named entity recognition (NER) methodologies from the field of NLP. Unlike other cybersecurity NER datasets, ours offers a rich set of annotations with contextual details, including some that span phrases and sentences. We conducted extensive experiments and applied NLP techniques to demonstrate the dataset’s effectiveness for attack attribution. These experiments highlight the potential of Large Language Models (LLMs) capabilities to improve the NER tasks in cybersecurity datasets for cyber-attack attribution.

Attribution, Dataset, LLMs, NLP, Named Entity Recognition
0302-9743
255-270
Springer Singapore
Deka, Pritam
81e1dc29-7bfa-46be-bb65-d48cf91708c8
Rajapaksha, Sampath
584c9a51-17b5-4b18-b4f8-4e413a40e9f0
Rani, Ruby
f7fdd7c5-1940-4fbc-b1bd-5ccdaadc33ba
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e
Barhamgi, Mahmoud
Wang, Hua
Wang, Xin
Deka, Pritam
81e1dc29-7bfa-46be-bb65-d48cf91708c8
Rajapaksha, Sampath
584c9a51-17b5-4b18-b4f8-4e413a40e9f0
Rani, Ruby
f7fdd7c5-1940-4fbc-b1bd-5ccdaadc33ba
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e
Barhamgi, Mahmoud
Wang, Hua
Wang, Xin

Deka, Pritam, Rajapaksha, Sampath, Rani, Ruby, Almutairi, Amirah and Karafili, Erisa (2024) AttackER: towards enhancing cyber-attack attribution with a named entity recognition dataset. Barhamgi, Mahmoud, Wang, Hua and Wang, Xin (eds.) In Web Information Systems Engineering – WISE 2024: 25th International Conference, Doha, Qatar, December 2–5, 2024, Proceedings, Part V. vol. 15440 LNCS, Springer Singapore. pp. 255-270 . (doi:10.1007/978-981-96-0576-7_20).

Record type: Conference or Workshop Item (Paper)

Abstract

Cyber-attack attribution is an important process that allows experts to put in place attacker-oriented countermeasures and legal actions. The analysts mainly perform attribution manually, given the complex nature of this task. AI and, more specifically, Natural Language Processing (NLP) techniques can be leveraged to support cybersecurity analysts during the attribution process. However powerful these techniques may be, they must address the lack of datasets in the attack attribution domain. In this work, we will fill this gap and will provide, to the best of our knowledge, the first dataset on cyber-attack attribution. We designed our dataset with the primary goal of extracting attack attribution information from cybersecurity texts, utilizing named entity recognition (NER) methodologies from the field of NLP. Unlike other cybersecurity NER datasets, ours offers a rich set of annotations with contextual details, including some that span phrases and sentences. We conducted extensive experiments and applied NLP techniques to demonstrate the dataset’s effectiveness for attack attribution. These experiments highlight the potential of Large Language Models (LLMs) capabilities to improve the NER tasks in cybersecurity datasets for cyber-attack attribution.

Text
WISE346 - Accepted Manuscript
Restricted to Repository staff only until 27 November 2025.
Request a copy

More information

Published date: 27 November 2024
Keywords: Attribution, Dataset, LLMs, NLP, Named Entity Recognition

Identifiers

Local EPrints ID: 502907
URI: http://eprints.soton.ac.uk/id/eprint/502907
ISSN: 0302-9743
PURE UUID: 37db008e-fe82-45de-a6b9-ef0ec1f88f81
ORCID for Amirah Almutairi: ORCID iD orcid.org/0000-0002-2194-7936
ORCID for Erisa Karafili: ORCID iD orcid.org/0000-0002-8250-4389

Catalogue record

Date deposited: 11 Jul 2025 17:03
Last modified: 12 Jul 2025 02:08

Export record

Altmetrics

Contributors

Author: Pritam Deka
Author: Sampath Rajapaksha
Author: Ruby Rani
Author: Amirah Almutairi ORCID iD
Author: Erisa Karafili ORCID iD
Editor: Mahmoud Barhamgi
Editor: Hua Wang
Editor: Xin Wang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×