The University of Southampton
University of Southampton Institutional Repository

A RAG-based question-answering solution for cyber-attack investigation and attribution

A RAG-based question-answering solution for cyber-attack investigation and attribution
A RAG-based question-answering solution for cyber-attack investigation and attribution
In the constantly evolving field of cybersecurity, it is imper- ative for analysts to stay abreast of the latest attack trends and perti- nent information that aids in the investigation and attribution of cyber- attacks. In this work, we introduce the first question-answering (QA) model and its application that provides information to the cybersecu- rity experts about cyber-attacks investigations and attribution. Our QA model is based on Retrieval Augmented Generation (RAG) techniques together with a Large Language Model (LLM) and provides answers to the users’ queries based on either our knowledge base (KB) that contains curated information about cyber-attacks investigations and attribution or on outside resources provided by the users. We have tested and evalu- ated our QA model with various types of questions, including KB-based, metadata-based, specific documents from the KB, and external sources- based questions. We compared the answers for KB-based questions with those from OpenAI’s GPT-3.5 and the latest GPT-4o LLMs. Our pro- posed QA model outperforms OpenAI’s GPT models by providing the source of the answers and overcoming the hallucination limitations of the GPT models, which is critical for cyber-attack investigation and attribu- tion. Additionally, our analysis showed that when the RAG QA model is given few-shot examples rather than zero-shot instructions, it gener- ates better answers compared to cases where no examples are supplied in addition to the query.
cyber-attack attribution, LLMs, RAG, QA
Springer
Rajapaksha, Sampath
584c9a51-17b5-4b18-b4f8-4e413a40e9f0
Rani, Ruby
f7fdd7c5-1940-4fbc-b1bd-5ccdaadc33ba
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e
Rajapaksha, Sampath
584c9a51-17b5-4b18-b4f8-4e413a40e9f0
Rani, Ruby
f7fdd7c5-1940-4fbc-b1bd-5ccdaadc33ba
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e

Rajapaksha, Sampath, Rani, Ruby and Karafili, Erisa (2024) A RAG-based question-answering solution for cyber-attack investigation and attribution. In Workshop on Security and Artificial Intelligence 2024, 29th European Symposium on Research in Computer Security. Springer.. (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

In the constantly evolving field of cybersecurity, it is imper- ative for analysts to stay abreast of the latest attack trends and perti- nent information that aids in the investigation and attribution of cyber- attacks. In this work, we introduce the first question-answering (QA) model and its application that provides information to the cybersecu- rity experts about cyber-attacks investigations and attribution. Our QA model is based on Retrieval Augmented Generation (RAG) techniques together with a Large Language Model (LLM) and provides answers to the users’ queries based on either our knowledge base (KB) that contains curated information about cyber-attacks investigations and attribution or on outside resources provided by the users. We have tested and evalu- ated our QA model with various types of questions, including KB-based, metadata-based, specific documents from the KB, and external sources- based questions. We compared the answers for KB-based questions with those from OpenAI’s GPT-3.5 and the latest GPT-4o LLMs. Our pro- posed QA model outperforms OpenAI’s GPT models by providing the source of the answers and overcoming the hallucination limitations of the GPT models, which is critical for cyber-attack investigation and attribu- tion. Additionally, our analysis showed that when the RAG QA model is given few-shot examples rather than zero-shot instructions, it gener- ates better answers compared to cases where no examples are supplied in addition to the query.

Text
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 20 July 2024
Keywords: cyber-attack attribution, LLMs, RAG, QA

Identifiers

Local EPrints ID: 493520
URI: http://eprints.soton.ac.uk/id/eprint/493520
PURE UUID: caafd614-74d9-4aca-a35c-93f0f80252f6
ORCID for Erisa Karafili: ORCID iD orcid.org/0000-0002-8250-4389

Catalogue record

Date deposited: 05 Sep 2024 16:31
Last modified: 06 Sep 2024 01:59

Export record

Contributors

Author: Sampath Rajapaksha
Author: Ruby Rani
Author: Erisa Karafili ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×