AttackER: towards enhancing cyber-attack attribution with a named entity recognition dataset

Cyber-attack attribution is an important process that allows experts to put in place attacker-oriented countermeasures and legal actions. The analysts mainly perform attribution manually, given the complex nature of this task. AI and, more specifically, Natural Language Processing (NLP) techniques can be leveraged to support cybersecurity analysts during the attribution process. However powerful these techniques may be, they must address the lack of datasets in the attack attribution domain. In this work, we will fill this gap and will provide, to the best of our knowledge, the first dataset on cyber-attack attribution. We designed our dataset with the primary goal of extracting attack attribution information from cybersecurity texts, utilizing named entity recognition (NER) methodologies from the field of NLP. Unlike other cybersecurity NER datasets, ours offers a rich set of annotations with contextual details, including some that span phrases and sentences. We conducted extensive experiments and applied NLP techniques to demonstrate the dataset’s effectiveness for attack attribution. These experiments highlight the potential of Large Language Models (LLMs) capabilities to improve the NER tasks in cybersecurity datasets for cyber-attack attribution.

Attribution, Dataset, LLMs, NLP, Named Entity Recognition

10.1007/978-981-96-0576-7_20

0302-9743

255-270

Springer Singapore

Deka, Pritam

81e1dc29-7bfa-46be-bb65-d48cf91708c8

Rajapaksha, Sampath

584c9a51-17b5-4b18-b4f8-4e413a40e9f0

Rani, Ruby

f7fdd7c5-1940-4fbc-b1bd-5ccdaadc33ba

Almutairi, Amirah

93ab82cb-5649-45b5-b6a7-a1ce15446354

Karafili, Erisa

f5efa31c-22b8-443e-8107-e488bd28918e

Barhamgi, Mahmoud

Wang, Hua

Wang, Xin

27 November 2024